I’m going to start talking about data replication, and the first and probably most important topic is that of consensus algorithms. I don’t want to regurgitate what others have written, so at first this is going to be a reading list. But, at some point, it will contain original bits, and then all of this will turn into a bibliography.

What went before

Here are a handful of the papers I think are important.

Consensus algorithms

This sub-field was launched with a paper from Lamport titled The Byzantine Generals Problem.

Wikipedia has a decent article

Note that state machine replication and consensus algorithms go together, and we’ll be diving into that as well, since that’s the actual fun/hard work.

More reading on consensus:

And Papers We Love has a whole section on distributed systems


2PC - two-phase commit

Conceptually, two-phase commit is straightforward; the commit-request phase (or call it the voting phase), and the commit phase, where the coordinator decides whether to commit or not, based on the information gathered in the voting phase.

This is a widely-used technique, although it has problems in the presence of failures. E.g. it assumes there is storage that can be trusted at each node, that no node crashes, and that nodes can communicate with each other. And it’s a blocking protocol. Other than that, it’s great :)

3PC - three-phase commit

Fixes much of fragility of 2PC, although it’s still vulnerable to partitions. E3PC tries to address the problem of partitions.


This was developed by Leslie Lamport in the late 1980s, circulated informally in the community in the early 1990s, and finally formally published in Transactions on Computer Systems in 1998. It is an improvement over 2PC and 3PC (two-phase commit and three-phase commit).

Paxos has the reputation of being complex, and developers are often warned against implementing it themselves. This seems a little draconian or elitist to me.

These are in publication order, but are starred with a reading order for those wanting to understand Paxos.


Blog articles

Uses of Paxos

  • Google: the Chubby distributed lock service (BigTable uses Chubby).
  • Yahoo and ZooKeeper.
  • Heroku and Doozerd.
  • Amazon Web Services.
  • Cassandra and Nutanix.
  • libpaxos-cpp


Raft was developed as an easier-to-understand consensus algorithm, easier that Paxos, that is.

Facebook is using Raft in HydraBase, to replace HBase. CoreOS is using Raft in etcd.


Introduced for TokuMX and MongoDB.

Names in consensus algorithms

Major names