What are our options if we want a distributed, fault-tolerant database?

MySQL Cluster (NDB)

It’s very widely used at scale.

Cassandra

Cassandra was written initially at Facebook, strongly influenced by Dynamo, Amazon’s key/value store. It was then moved to be an Apache projec in 2010. Cassandra has been used at very large scale. The main challenge is that it has an eventual consistency model, which can be challenging to work with (Facebook moved from Cassandra to HBase due to this).

Riak

Riak also follows the thread from Dynamo, and is a key-value store written in Erlang. Riak can do in-memory or disk storage, or both. Riak is very fault-tolerant but not as fast as something like Redis, leading to the idea that perhaps Riak is the back-end for a Redis cache?

HyperDex

HyperDex is a research project from Cornell turned commercial. I think this project started in 2010, but it seemed to be at a usable state by 2012.

MongoDB

We should not use MongoDB.

HBase

HBase is implementation of Google’s BigTable started by Powerset. It moved to the Apache Foundation in 2009, and is part of the Hadoop project; in fact, it runs on top of the Hadoop File System.

Facebook started using HBase in 2010 for their new messaging platform, and have now forked it into HydraBase. One key thing was the switch to RAFT for the consensus algorithm.

Redis Cluster

Not sure.

Cockroach DB

This will be awesome someday

General notes

Call Me Maybe - Jepsen

Challenges to Adopting Stronger Consistency at Scale

Visual Guide to NoSQL Systems

Cassandra vs MongoDB…

Distributed systems for fun and profit