Distributed Databases

What are our options if we want a distributed, fault-tolerant database?

MySQL Cluster (NDB)

It’s very widely used at scale.

Cassandra

Cassandra was written initially at Facebook, strongly influenced by Dynamo, Amazon’s key/value store. It was then moved to be an Apache projec in 2010. Cassandra has been used at very large scale. The main challenge is that it has an eventual consistency model, which can be challenging to work with (Facebook moved from Cassandra to HBase due to this).

Cassandra home page.
Dynamo: Amazon’s Highly Available Key-value Store
Apache Cassandra page on Wikipedia.
Hulu Chooses Cassandra Over HBase and Riak

Riak

Riak also follows the thread from Dynamo, and is a key-value store written in Erlang. Riak can do in-memory or disk storage, or both. Riak is very fault-tolerant but not as fast as something like Redis, leading to the idea that perhaps Riak is the back-end for a Redis cache?

Riak home page and docs.
Riak page on Wikipedia.
Not So Versus, Riak Versus Redis
My Year of Riak from 2011.
Riak Compared to Cassandra potential bias since this is from Riak.
Riak versus Cassandra

HyperDex

HyperDex is a research project from Cornell turned commercial. I think this project started in 2010, but it seemed to be at a usable state by 2012.

HyperDex home page and papers
- HyperDex: A Distributed, Searchable Key-Value Store
Emin Gün Sirer
Robert Escriva
HyperDex page on Wikipedia

MongoDB

We should not use MongoDB.

HBase

HBase is implementation of Google’s BigTable started by Powerset. It moved to the Apache Foundation in 2009, and is part of the Hadoop project; in fact, it runs on top of the Hadoop File System.

Facebook started using HBase in 2010 for their new messaging platform, and have now forked it into HydraBase. One key thing was the switch to RAFT for the consensus algorithm.

HydraBase – The evolution of HBase@Facebook

Redis Cluster

Not sure.

Cockroach DB

This will be awesome someday

Cockroach Labs
cockroachdb GitHub organization.
Out in the Open: Ex-Googlers Building Cloud Software That’s Almost Impossible to Take Down 2014 Wired Article

General notes

Call Me Maybe - Jepsen

Challenges to Adopting Stronger Consistency at Scale

Visual Guide to NoSQL Systems

Cassandra vs MongoDB…

Distributed systems for fun and profit