Note

Generally, consistency model define the behavior how mutliple concurrent workflows read/write data or replciation. And it works similiar among the following situations:

  • Mutilple threads read/write memory within a process
  • Mutilple processes read/write shared memory within a machine
  • Mutilple nodes read/write shared data within a cluster

In the following paragraph, I’d like to use thread to stands for the concurrent workflow to ease the demostration. So you can simply replace thread with process or node in the following paragraph.

Consistency is Hard in (distributed) systems

  • Data replication (Caching)
  • Concurrency (no shared clock)
  • Failures (machine or network)

Consistency Model

Overview

Strong ConsistencyWeak Consistency
DefThe current state of a data item follows a universally and mutually accepted sequence of change of state.
每个并发workflow`线程or进程or节点`看到的操作执行顺序是一样的
It allows distinct views of the database state to see different and unmatched updates in the database state.
Client AwarenessNo. End-Client is unaware of replications of data.Yes. For a specific key, different clients may get different value/version.
Typical ModelsStrict
严格
Linear/Atomic
线性
Sequential
顺序
Casual
因果
Eventual
最终
Comparsion----> (from left to right) ---->
less consistent data
higher performance(lower latency, higher thoughput)
higher availability
Descriptionevery read will see the most recent write in real time.reads see the most recent write that is not overlapped with it.all writes must be globally ordered in some way that all threads/process/nodes agreeif A causes B, all threads that see the result of B must see the result of A as well.No order constraints at all. But eventually all threads will converge.
Common DesignImpossible in practice for threads or nodes to agree on a precise current timeDistributed Lock,
2PC commit,
Distributed Data Store with Consensus(Paxos, Raft)
Consistent Core Pattern with centrized design1. COPS
2. Logical Lock (version)
1. Asynchronous data synchronization: (copying 、update log、meesage queue )
2. data with version
ProjectGoogle Spanner, GFS1. IVY(distributed shared memory)
2. Zookeeper
1. MongoDB,
2. Message App(Whatsapp/iMessage)
3. self share(wechat moments)
1. DNS syncing
2. Amazon DynamoDB
3. Elasticsearch(syncing between primary and replica shards
4. Gossip protocol
5. Most news platform
6. Block chain

Simple Example

  1. Strict Consistency
  2. Linear/Atomic Consistency

  3. Sequential Consistency
  4. Casual Consistency

  5. Eventual Consistency
  6. Not Consistent Consistency

Real Example - Quorum(NWR)

Quorum-based consistency is common to set replication consistency, such as Cassandra, HDFS. It uses a voting mechanism to determine the consistency of data operations. Each data operation, such as read or write, requires a certain number of nodes to acknowledge the operation before it is considered successful.

  • N = nodes in the quorum group(cluster)
  • W = minimum write nodes (write quorum)
    • the write operation could only be completed after writing W nodes synchronously. And then the updated node will deliver change to to other nodes asynchronously.
  • R = minimum read nodes (read quorum)
    • the read operation could only be completed after reading R nodes synchronously.

Relation with Consistency

  • ≥Sequential: W + R > N
    • e.g. Let X=0 initially. After a write operation to set X=1, the read operation will get X=1
  • Eventual: W + R ≤ N
    • e.g. Let X=0 initially. After a write operation to set X=1, the read operation may still get X=0

Cassandra could achieve consistency level by setting different quorum config

Confused Concepts

Consitency vs Coherence

  • Coherence is related to the multiple values/versions among different data layer. e.g. Cache Coherence
  • Consistency is the agreement between multiple threads/processes/nodes in a system to achieve a certain value.

在中文中,都是翻译成一致性。但是其实是有侧重的,Coherence关心同一个数据在不同垂直层级上的不同值、版本,比如缓存;而Consistency讨论的是 同一个水平层级下不同的节点、副本之间如何得到相同的值、版本。在大部分工作场合里,其实也不需要去区分。

Consistency vs Isolation (Database)

  • Isolation:
    refers to the ability of a database to allow a transaction to execute as if there are no other concurrently running transactions. The overarching goal is to prevent reads and writes of temporary, incomplete, aborted, or otherwise incorrect data written by concurrent transactions.
  • Consistency:
    when a modern system offers multiple consistency levels, they define consistency in terms of the client view of the database. e.g. If two clients can see different states at the same point in time, we say that their view of the database is inconsistent

Reference