Replication Speed: CouchDB vs. MongoDB

I’m posting this because I hadn’t seen this comparison done before. Therefore, it might be useful to someone else. Maybe.

This was a rough test to see how long a slave would take to catch up with its master, which was sustaining about 500 document insertions per second with about 5 million documents already stored. The general verdict was that MongoDB replication is much faster, at least for this particular data model (i.e. logging data.) For CouchDB, inserting millions of small documents is probably not the most efficient way to do things.

I set up Ubuntu 9.10 VMs with 2 GB of RAM each for the master and two slaves. The first slave was set to replicate with the master from the beginning, to see if it ever lags (it was negligible for both DBMSs.) The other slave was used to see how long replication took to catch up.

CouchDB (v0.11)

For CouchDB, after 5,015,630 documents had been inserted over the course of 2 hours and 48 minutes, I started continuous replication on the third node and waited for it to catch up.

At 9,266,860 documents total, the third node had finally caught up with the master, after another 2 hours and 47 minutes, almost the same amount of time it had been offline.

For shorter intervals, the rate of replication was better. For example, it took 16 minutes for an empty database to catch up to a master that had been running for 30 minutes sustaining 500/document insertions per second.

MongoDB (v1.4)

MongoDB took considerably less time, completing the sync of 5,015,639 documents in under 7 minutes. Roughly half of the time was spent creating the new databases on disk and filling them with zeros, and the other half applying document inserts.

If shutdown improperly the mongod node needs to be repaired (with mongod --repair), which took around 11 minutes for 5 million records, ~4 GB of data. There really isn’t an improper way to shutdown CouchDB, so you can boot and start replicating immediately.

Conclusion

We’ll probably use MongoDB for dumping log data. It’s simple enough and the data is not exactly mission critical, so MongoDB’s shortcomings (durability, master-master limitations) aren’t enough of a deterrent.

I really like CouchDB’s replication system. It reminds me of git and allows for the same fully distributed model, but our cluster is not large or diverse enough to really take advantage of those features. I would have picked CouchDB if not for the dramatic difference in replication speed. But hey, no one DBMS is a hammer for all nails.

Extra Notes on Size

Documents were all clones of this JSON:

{"play_time":"19:02:20","file_name":"8821_20100224-1100.xml","playlist":"none","event":"playlist","player_id":9861,"play_date":"2010-03-02"}

CouchDB

  • Master: 5,015,630 docs, 5.0 GB
  • Slave: 5,015,630 docs, 9.3 GB

CouchDB after sync completed (compacted)

  • Master: 9,266,860 docs, 7.4 GB (3.6 GB)
  • Slave: 9,266,860 docs, 18.6 GB (3.6 GB)
  • Slave2: 9,266,860 docs, 13.9 GB (3.6 GB)

MongoDB was the same size on all nodes, or just under 4 GB each after 5 million+ documents (as you may know, MongoDB adds new database chunks up to 2 GB as needed.)