What are our options if we want a distributed, fault-tolerant database?
MySQL Cluster (NDB)
It’s very widely used at scale.
Cassandra was written initially at Facebook, strongly influenced by Dynamo, Amazon’s key/value store. It was then moved to be an Apache projec in 2010. Cassandra has been used at very large scale. The main challenge is that it has an eventual consistency model, which can be challenging to work with (Facebook moved from Cassandra to HBase due to this).
- Cassandra home page.
- Dynamo: Amazon’s Highly Available Key-value Store
- Apache Cassandra page on Wikipedia.
- Hulu Chooses Cassandra Over HBase and Riak
Riak also follows the thread from Dynamo, and is a key-value store written in Erlang. Riak can do in-memory or disk storage, or both. Riak is very fault-tolerant but not as fast as something like Redis, leading to the idea that perhaps Riak is the back-end for a Redis cache?
- Riak home page and docs.
- Riak page on Wikipedia.
- Not So Versus, Riak Versus Redis
- My Year of Riak from 2011.
- Riak Compared to Cassandra potential bias since this is from Riak.
- Riak versus Cassandra
HyperDex is a research project from Cornell turned commercial. I think this project started in 2010, but it seemed to be at a usable state by 2012.
- HyperDex home page and papers
- Emin Gün Sirer
- Robert Escriva
- HyperDex page on Wikipedia
We should not use MongoDB.
HBase is implementation of Google’s BigTable started by Powerset. It moved to the Apache Foundation in 2009, and is part of the Hadoop project; in fact, it runs on top of the Hadoop File System.
Facebook started using HBase in 2010 for their new messaging platform, and have now forked it into HydraBase. One key thing was the switch to RAFT for the consensus algorithm.
This will be awesome someday
- Cockroach Labs
- cockroachdb GitHub organization.
- Out in the Open: Ex-Googlers Building Cloud Software That’s Almost Impossible to Take Down 2014 Wired Article