A system that stores data should be both durable (no data loss) and available (data is accessible when needed). Itโs difficult to achieve this with a single machine since it can fail and bottleneck. Instead, we can store copies of the data on multiple machines; data will be safe as long as thereโs at least one machine working, and users can access data from any machine.
If data was read-only, this would be a simple system. Writes make it more complicated:
- Replication protocol: we need to propagate updates to all replicas.
- Consistency model: we need to define the โrightโ value when a client asks for the data.
Replication Protocols
Primary-Based
A primary-based protocol chooses a primary for each object thatโs responsible for coordinating updates for that object. The client can still send requests to any replica, but the primary should coordinate changes. There are two ways to implement this:
- Remote writes: replicas forward writes to the primary, which propagates writes to all other replicas. This is common for distributed file systems and databases.
- Local writes: the replica that receives the write request becomes the primary and propagates it to all other replicas. This can be used during disconnection operation; the primary can always accept writes, so if a node is going to be disconnected, it can first become the primary, accept the writes, then propagate them after itโs reconnected.
Quorum-Based
Primary-based protocols are susceptible to bottlenecks and single points of failure. To make replication fully distributed, a client can ask multiple replicas for permission for some request. Specifically, it must ask
- To read, each server in the quorum responds with the latest version number of the object. The client downloads from the server with the latest version number.
- To write, each server responds with the latest version number, and the client sends the new object and the highest number plus one to all servers in the quorum.
Let
Consistency Models
A consistency model defines the โcorrectโ behavior to guarantee.
- Sequential consistency: result of any execution is same as some serial ordering, and all clients see operations in the same order. This can be achieved via the primary serializing operations (primary-based) or preventing concurrent writes (quorum-based).
- Causal consistency: writes that are causally related must be seen in the same order by all nodes, but concurrent writes can be in a different order. We can implement this with Vector Clocks.
- Eventual consistency: in absence of further updates, all replicas converge to the same state (by lazily forwarding updates). This is used when we donโt care about interactions between multiple clientsโfor example, if each object is only used by one client.
Client-Centric Consistency
In the final case, we only consider a single clientโs experience with our system, not what multiple clients see. However, this client can access different replicas, and we can ensure some client-centric consistency guarantees:
- Monotonic reads: if a client reads the value of an object, successive reads will return the same value or a more recent one. This means the value of the read must be propagated before returning the result to the next read.
- Monotonic writes: if a client writes to an object, the write is completed before subsequent writes. Similarly, this means the written value must be propagated before applying the next write.
- Read your writes: if a client writes to an object, the value will always be seen by subsequent reads.
- Writes follow reads: if a client writes to an object after reading, the change will take place on the same or more recent value that was read.