Design Distributed System
Jun 21, 2017设计分布式系统所需要遵从的原则
复杂的系统设计需要遵从极其简单的基本原则。以下设计原则经常在设计复杂系统时需要考虑:
- Modularity
- Efficiency
- Scalability
- Extensibility
- Service/object oriented
- Prioritizing user interactive requests
- Partition and separation
- Design as simple as needed
设计分布式系统所需要考虑的方面
- 当你需要同时应对X请求时,设想100X甚至100,000X请求时,系统所承受的场景
- 并行化计算+数据读写等操作意味着尽可能地提升计算资源的利用率,即optimizing architecture according to some constraints
- 一旦使用并行,需要面临的两大问题:一致性原则(consistency)、故障处理(failure handling)
- 设计复杂系统时,请仔细思考系统各功能模块的需求以及相互的依赖关系
- load balancing (routing), replicate, caching, indexing, message queue, proxy等是一些常用系统设计技巧
- Back-of-the-envelope calculation 在设计初期会很有帮助
- Service的设计尽可能的独立,减少Service之间的依赖性,系统的设计遵循模块化
- 重点设计protocol,而非communication
一些常用的设计模式
![Figure. 1 A typical load balancer try to pick a proper worker to process the requests. We normally expect the dispatcher to follow some geographical pattern so that users perceive little latency. For simple case, a very basic round-robin method can also be used.](http://1.bp.blogspot.com/_j6mB7TMmJJY/TLnj_mWL50I/AAAAAAAAAgg/JFPsfGcAenI/s1600/p1.png)
Figure. 1 A typical load balancer try to pick a proper worker to process the requests. We normally expect the dispatcher to follow some geographical pattern so that users perceive little latency. For simple case, a very basic round-robin method can also be used.
![Figure. 2 Multicast to all workers and the dispatcher aggregates all responses, then sends it back to requester.](http://2.bp.blogspot.com/_j6mB7TMmJJY/TLlDyOK60HI/AAAAAAAAAfI/JreI7fqvohA/s1600/P2.png)
Figure. 2 Multicast to all workers and the dispatcher aggregates all responses, then sends it back to requester.
![Figure. 3 Adding cache can obviously speed up a lot in large-scale systems.](http://4.bp.blogspot.com/_j6mB7TMmJJY/TLlEpBawVMI/AAAAAAAAAfQ/Jp8vbVYnF0s/s1600/P3.png)
Figure. 3 Adding cache can obviously speed up a lot in large-scale systems.
![Figure. 4 The principle is very similar as parameter server, where parameters are intermediate results from learning.](http://3.bp.blogspot.com/_j6mB7TMmJJY/TLlFf-b8lPI/AAAAAAAAAfY/Poy8V0eH1gA/s400/P4.png)
Figure. 4 The principle is very similar as parameter server, where parameters are intermediate results from learning.
![Figure. 5 This pattern follows design of a message queue, events are enqueue and responses are dequeue.](http://4.bp.blogspot.com/_j6mB7TMmJJY/TLlGIM4IDiI/AAAAAAAAAfg/nQgVADmUl5w/s400/P5.png)
Figure. 5 This pattern follows design of a message queue, events are enqueue and responses are dequeue.
![Figure. 6 Very popular map-reduce pattern](http://3.bp.blogspot.com/_j6mB7TMmJJY/TLlHPyMkTII/AAAAAAAAAf4/McnK_GGkYpw/s400/P7.png)
Figure. 6 Very popular map-reduce pattern
![Figure. 7 Bulk Synchronous Parellel](http://4.bp.blogspot.com/_j6mB7TMmJJY/TLnhYZH7PTI/AAAAAAAAAgY/YHy5K8H6hZA/s400/P8.png)
Figure. 7 Bulk Synchronous Parellel
![Figure. 8 Execution Orchestrator is based on an intelligent scheduler / orchestrator to schedule ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers. This pattern is used in Microsoft’s Dryad project](http://3.bp.blogspot.com/_j6mB7TMmJJY/TLlH_a9WOMI/AAAAAAAAAgI/41l0bvV3fkE/s400/P8.png)
Figure. 8 Execution Orchestrator is based on an intelligent scheduler / orchestrator to schedule ready-to-run tasks (based on a dependency graph) across a clusters of dumb workers. This pattern is used in Microsoft’s Dryad project