Consistency Model Summary in Distributed System
Note
Generally, consistency model define the behavior how mutliple concurrent workflows read/write data or replciation. And it works similiar among the following situations:
- Mutilple threads read/write memory within a process
- Mutilple processes read/write shared memory within a machine
- Mutilple nodes read/write shared data within a cluster
In the following paragraph, I’d like to use thread to stands for the concurrent workflow to ease the demostration. So you can simply replace thread with process or node in the following paragraph.
Consistency is Hard in (distributed) systems
- Data replication (Caching)
- Concurrency (no shared clock)
- Failures (machine or network)
Consistency Model
Overview
Strong Consistency | Weak Consistency | ||||
---|---|---|---|---|---|
Def | The current state of a data item follows a universally and mutually accepted sequence of change of state. 每个并发workflow`线程or进程or节点`看到的操作执行顺序是一样的 | It allows distinct views of the database state to see different and unmatched updates in the database state. | |||
Client Awareness | No. End-Client is unaware of replications of data. | Yes. For a specific key, different clients may get different value/version. | |||
Typical Models | Strict 严格 | Linear/Atomic 线性 | Sequential 顺序 | Casual 因果 | Eventual 最终 |
Comparsion | ----> (from left to right) ----> less consistent data higher performance(lower latency, higher thoughput) higher availability | ||||
Description | every read will see the most recent write in real time. | reads see the most recent write that is not overlapped with it. | all writes must be globally ordered in some way that all threads/process/nodes agree | if A causes B, all threads that see the result of B must see the result of A as well. | No order constraints at all. But eventually all threads will converge. |
Common Design | Impossible in practice for threads or nodes to agree on a precise current time | Distributed Lock, 2PC commit, Distributed Data Store with Consensus(Paxos, Raft) | Consistent Core Pattern with centrized design | 1. COPS 2. Logical Lock (version) | 1. Asynchronous data synchronization: (copying 、update log、meesage queue ) 2. data with version |
Project | Google Spanner, GFS | 1. IVY(distributed shared memory) 2. Zookeeper | 1. MongoDB, 2. Message App(Whatsapp/iMessage) 3. self share(wechat moments) | 1. DNS syncing 2. Amazon DynamoDB 3. Elasticsearch(syncing between primary and replica shards 4. Gossip protocol 5. Most news platform 6. Block chain |
Simple Example
- Strict Consistency
- Linear/Atomic Consistency
- Sequential Consistency
- Casual Consistency
- Eventual Consistency
- Not Consistent Consistency
Real Example - Quorum(NWR)
Quorum-based consistency is common to set replication consistency, such as Cassandra
, HDFS
. It uses a voting mechanism to determine the consistency of data operations. Each data operation, such as read or write, requires a certain number of nodes to acknowledge the operation before it is considered successful.
- N = nodes in the quorum group(cluster)
- W = minimum write nodes (write quorum)
- the write operation could only be completed after writing W nodes synchronously. And then the updated node will deliver change to to other nodes asynchronously.
- R = minimum read nodes (read quorum)
- the read operation could only be completed after reading R nodes synchronously.
Relation with Consistency
- ≥Sequential: W + R > N
- e.g. Let X=0 initially. After a write operation to set X=1, the read operation will get X=1
- Eventual: W + R ≤ N
- e.g. Let X=0 initially. After a write operation to set X=1, the read operation may still get X=0
Cassandra could achieve consistency level by setting different quorum config
Confused Concepts
Consitency vs Coherence
- Coherence is related to the multiple values/versions among different data layer. e.g. Cache Coherence
- Consistency is the agreement between multiple threads/processes/nodes in a system to achieve a certain value.
在中文中,都是翻译成一致性。但是其实是有侧重的,Coherence关心同一个数据在不同垂直层级上的不同值、版本,比如缓存;而Consistency讨论的是 同一个水平层级下不同的节点、副本之间如何得到相同的值、版本。在大部分工作场合里,其实也不需要去区分。
Consistency vs Isolation (Database)
- Isolation:
refers to the ability of a database to allow a transaction to execute as if there are no other concurrently running transactions. The overarching goal is to prevent reads and writes of temporary, incomplete, aborted, or otherwise incorrect data written by concurrent transactions. - Consistency:
when a modern system offers multiple consistency levels, they define consistency in terms of the client view of the database. e.g. If two clients can see different states at the same point in time, we say that their view of the database is inconsistent
Reference
- Understand Consistency Levels in Distributed Systems
- 共识、线性一致性、顺序一致性、最终一致性、强一致性概念区分
- Weak Levels of Consistency
- Consistency Vs Coherence
- Demystifying Database Systems, Part 3: Introduction to Consistency Levels
- 数据一致性 VS 事务隔离性
- Overview of Consistency Levels in Database Systems
- CMU slides 15-446 Distributed Systems Spring 2009
- 知乎 分布式系统一致性总结