System Design

#System Design Original from posts ##Table of Content ####1. Design Process ####2. Statistic Numbers ####3. Basic Knowledge ####4. Topics ####5. 海量数据


##Design Process

  1. 按照网上的步骤
    1. Constrains and use cases - 分析问题框架
    2. Abstract Design - 设计service/storage layer
    3. Bottlenecks - 根据估算数据分析瓶颈
    4. Scaling Abstract Design - Scale
  2. 从base开始, 最后变得复杂, 可以一开始vertical, 后面horizontal, 注意中间是通过估算 + Load Test来实现的
  3. 核心是从两个方向来展开
    • Service Layer
      1. More server(应该是这里体现consistent hashing, map reduce应该也是这里)
      2. Load balancer
    • Storage Layer
      1. NoSQL vs Relation SQL
      2. 如果小的话就store in memory, 大的话就得考虑sharding(easier for replicate and backup)
      3. 如果读写不均匀的话就分开读写, master/slave replication(reading from slaves and write to master)

##Statistic Numbers ####公司

#####Facebook

#####Twitter

#####Amazon

####程序内

####Computer

####Note

  1. 所有IP是能放进内存的,因为一共2^32个ip地址

##Basic Knowledge

###Harvard Class

  1. 形式: Multi-tier architecture Multi-tier architecture Another Pic

  2. 重要的几个东西

    1. DNS - 可以通过DNS来进行geo based load balancing, nslookup google
    2. Firewall - 只允许来自80 443 22 VPN端口的访问. 过了下面那层LB,把443转换成80就行了
    3. Load Balancer
      • 分为软的和硬的
      • Software - Elastic Load Balancing, HAProxy(TCP/HTTP), Linux Virtual Server
      • Hardware - Barracuda, Cisco, Citrix, F5 * 方法
      • Round robin 平均分配
      • Weighted round robin
      • Least connections
      • Least response time
      • Layer 7 load balancers can further distribute requests based on application specific data such as HTTP headers, cookies, or data within the application message itself, such as the value of a specific parameter. * Heart Beat health check
      • Active/Active - 意味着run full capacity, 如果一个跪了,整体的load balancing速度会降低
      • Active/passive * ip-hash 根据ip造server
    4. Web Server
      • 可以partition,按照某种方法处理request(例如名字)
    5. 在Web Server和Storage之间还可能要有一层Load Balancer
    6. Switch - 每个server在联入网的时候都是有两个switch,需要调节好防止packet在中间形成loop
    7. Storage
      • NoSQL vs Relational SQL
      • Raid0, Raid1, Raid5, Raid6, Raid10
      • Master/Slave Mode (重点是replica)
      • 一个Master多个Slave, 内容一样,如果Master跪了可以promote一个Slave
      • 分开读写,Master写,Slave读,实时同步(好像有点点SPF) * Master/Master Mode
      • 两个Master多个Slave, 就不会跪了

###NoSQL vs Relational SQL

###Sharding

###MapReduceword count作为栗子

###Consistent Hashing

###Storm Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation.

Another introduction

  1. Spout is A source of streams in a computation. Spout implementations already exist for most queueing systems. (水龙头, 源源不断的往外送tuple)
  2. Bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on. (Core function of streaming computation)
  3. Topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation.

###Spark

###Hadoop

###Yet Another Resource Negotiator(YARN) (Cluster Resource Mangement)

####Hadoop Distributed File System (HDFS) (Redundate, Reliable Storage)

###ZooKeeper A Distributed Coordination Service for Distributed Applications

###DevOps


##Topics

###Tiny URL

  1. Constrains and Use Cases
    1. Use Cases - What do we use the system for?
      1. Shortening: take a url => return a much shorter url
      2. Redirection: take a short ulr => redirect to the original url
      3. Custom url
      4. Analytics
      5. Automatic link expiration~
      6. Manual link removal
      7. UI vs API
      8. Highly available
    2. Constrains:
      1. Basic knowledge:
        • 1.3 Billion Facebook active users
        • 650 Million Twitter
        • New Tweets per day 500 Million
      2. 这道题的数据
        1. All shortened URLs per Month: 1.5BN
        2. Sites below the top3: 300M per month
        3. We: 100M per month
        4. 1BN request per month
        5. request by second 400+per second(40 shorten 360 redirects)
        6. Total url 5 years * 12 * 100M: 6Billion url in 5 years
        7. 每个url长度 500bytes: 1 char = 1 byte 这个太重要了(ASCII是128=2**7个, 1byte就够了, 但是UTF-8是1~4bytes一个字符)
        8. url是case sensitive的
        9. 6 bytes per hash
        10. 注意, 10^3 K->kb, 10^6 M->MB, 10^9 B->GB, 10^12 ->TB
        11. New data written per second 40*(500+6)bytes = 20kb
        12. Data read per second: 360 * 506 bytes = 180k
      3. 重要的几个点
        1. Storage
        2. Data written & Data Read
  2. Abstract Design - Finally simple the problem to per second
    1. Application service layer(serves the requests)
      • Shortening Service
      • Redirection Service
    2. Data storage layer(keep track of the hash =>url mapping)
      • Act like a big hash table: stores new mappings and retrieves a value given a key
    3. hashed_url = convert_to_base_62(md5(original_url + random_salt))[:6]
      • Base 62 是一种short ulr的encoding, encode之后只有62种字符0-9 a-z A-Z
  3. Understanding Bottlenecks
    • Traffic - not going to be very hard
    • Lots of data - more interesting
  4. Scaling Abstract Design
    1. Application Service Layer
      • Start with one machine
      • Measure how far it take us(load tests)
      • Add a load balancer + a cluster of machines over time
        1. to deal with spiky traffice
        2. Also avoid single failure and increase the availability
    2. Data Storage Layer
      1. What’s data?
        • Billions of objects
        • Each object is fairly small(<1k)
        • There are no relationship between the objects
        • Reads are 9x more frequent thant writes(360 reads, 40 writes per second)
        • 3TBs of urls, 36GB of hashes
      2. NoSQL vs Relation SQL
        • MySQL:
          • Widely used
          • Mature technology
          • Clear Scaling Paradimgs(sharding, master/slave replication, master/master replication)
          • Used by Facebook, Twitter, Google etc
          • Index lookups are very fast
        • Mappings
          • hash: varchar(6)
          • original_url: varchar(512)
      3. Create a unique index on the hash(36GB+). We want to hold it in memory to speed up lookups.
        1. Vertical Scaling of the MySQL machine (memory is cheap) (vertical for a while)
        2. Partition the data: 5 partitions, 600GB of data, 8GB of indexes (Eventually partiion)
          • We can add more shards in the future. Easier for backup and replicate
          • Default idea of this is: get the first char from the hash % num_of_partition
      4. Master/Slave replication(reading from the slave replicas, writes to the master)(如果有一天read/write不balance了的话) Master/Master

###Image Hosting Applicaiton

  1. 考虑因素
    • Cost-effective
    • Highliy available
    • Low latency (fast retrieval)
  2. 功能
    • Upload image
    • Query image
  3. Services
    • Separate read/write (Read usually from cache, Write usually to disk eventually)
      Write is slower than read
    • 由上面那个导致Apache can maintain 500 connections. Writes can quickly consume all of those,但是read可以asynchronouse and also take advantage of gzip compression or chunked transfer encoding
    • 所以就应该Separate read/write services and scale them independently
    • 解决办法是distribute users across different shards each shard can handle only a set of number of users, 好处是:
      • 系统变的flexible
      • 减小了outage带来的危害
  4. Redundancy
    • Data redundancy
    • Service redundancy
      • Remove single point failure by provide backup or spare functionality
      • Create shared-nothing architecture (Each nodes operate independently without knowing other nodes)
  5. Partitions
    • Large data - second server to store parts of the data set
    • Computing resource - splitting the operations or load across addtion nodes
    • Partitions or shards - each logical set of functionality is separate
      • geographic boundaries
      • non-paying users vs paying users
    • Data locality (the closer the data to the operation point, the better the performance)
    • Inconsistency
  6. Building Blocks
    • Caches
      • Global Caches
        1. Request Node只跟GC联系, 如果data不在GC去问database. (Majority way) (如果cache data file太大, 取就难取)
        2. Request Node先跟GC联系, 如果data不在再自己去问database要
      • Distributed Caches (用consistent hashing)
        • Pros - 可以通过增加Node数量来增加cache大小
        • Cons - Remedying a missing node -> store multiple copies of data -> too complicated -> missing is no big deal
    • Proxies (many proxies are also caches but not all caches act s proxies)
      • Collapsed forwarding - collaps the same or similar requests in to one request and return single result to the requesting client
      • Collapse request for data that is close together
      • Should put caches in front of proxy, because cache runs faster than proxy
    • Index - Trade-offs of increased storage overhead and slower writes (since you must both write the data and update the index) for the benefit of faster reads
      • Index Table - 就像目录一样
      • Many layers of indexes
      • Inverted index - 用来从word反找书页
    • Load Balancer - distribute load across a set of nodes responsible for servicing requests.
      • Sticky Session - die了之后的解决办法 1. browser cache, cookie 2. URL Rewriting??
      • 几种分配方式
        • Round Robin
        • Random
        • Least Connection
        • Least Response Time
        • Layer 7 check header
    • Queues
      • Queues enable clients to work in an asynchronous manner, providing a strategic abstraction of a client’s request and its response. On the other hand, in a synchronous system, there is no differentiation between request and reply, and they therefore cannot be managed separately.
      • RabbitMQ, ActiveMQ. 以RabbitMQ为例, 几种比较重要的模型, 都是Producer-Consumer模型
        • P -> Queue -> C : Normal模型, producer-consumer
        • P -> Queue -> CC : Work Queue, 一个Producer对应多个consumer
        • P -> X -> QQ -> CC : Publish/Subscribe
        • RPC模型

Queue

#####Fit to Pattern

  1. Constrains and Use Cases
    1. Use Cases
      • Upload image
      • Query image
    2. Constrains
      • Storage scalability
      • Low latency for image downloads/requests
      • High availability
      • Cost-effective

###News Feed

Also see this


###Facebook Chat

###Messaging System


###Type Head Search/Search Suggestion

#####Services

#####Three Way Trade-off


###POI

#####Implement second/minute/hour/day counters

#####Photo Storage


##海量数据


##Others

#####Facebook

#####Note

  1. Similar to web server or CDN edge server

###NC 笔记

###TODO