4 Good Things About CouchDB

Two years ago I decided to start using CouchDB in place of MySQL for new projects. Much software has been written, many transactions have taken place, and I'm ready to share my experience.

For a newcomer, it can be difficult to figure out the proper way to use CouchDB. Along with its traditional database features, it has a number of exotic abilities that allow it to play unconventional roles. For instance, CouchDB can be your user-facing web server, eliminating the typical "application" layer served by PHP or node. Or it can serve as a local offline database with occasional synchronization.

I know many of its exotic features have been put to good use by other people, but my experience has been with CouchDB filling the role traditionally occupied by MySQL. After some false starts and hard-learned lessons, I can say that CouchDB definitely fills that role well and could be considered a superior choice to MySQL in many meaningful ways.

CouchDB has four features that really make it stand out:

  1. It has no read locks.
  2. You can back up a database with cp without shutting it down.
  3. Any record (row, document, whatever) can participate in any index any number of times.
  4. Replication is easy and can be bidirectional.

CouchDB Has No Read Locks

You can pose a question to MySQL that takes many seconds to answer and blocks writes on one or more tables until the complete answer is computed. Many observed performance issues in web software are a direct result of this fact. An expensive query initiated by one user can stall critical writes for other users leading to unpredictable performance and an impression that your software is unresponsive.

CouchDB stands up better to concurrent use by multiple users because it has absolutely no read locks. This is possible because CouchDB never updates documents in place. Changes are always appended to the end of the database file. Consequently, writes that occur while views are being queried won't ever interfere with those queries.

The disadvantage of this setup is that CouchDB database files can grow huge between compactions, but that cost is probably worth it for your project, and the append-only database format is the key to another one of CouchDB's real advantages which I discuss below.

CouchDB Supports Trivial Hot Back-Ups

You can make a point-in-time backup of a CouchDB database file with the cp command while CouchDB is running. This is another advantage of CouchDB's append-only database file format.

If you are using MySQL with MyISAM tables, you have to lock every table and then copy the files while writes are stalled. If you are using InnoDB, you pretty much have to pay for MySQL Enterprise Edition if you want to back up your database.

A surprising number of projects launch with no data backup strategy in place because a surprising number of database products don't make backups easy. This is one place where CouchDB just destroys the competition, and the competition should feel a little embarrassed.

Documents Can Be in Any Index Any Number of Times

In MySQL, indexes belong to tables. If you have a table T with an index I, each row in T will also exist in I exactly once. This limitation is such a natural part of SQL databases that most developers have a hard time wrapping their heads around an alternative, but CouchDB's view engine provides a very cool one. Instead of having indexes be limited to records of just one type, a CouchDB view may include any document in the whole database any number of times.

For example, you can define a view for searching documents by keyword that includes documents of many different types, and it may include multiple entries for each document that has more than one keyword. Such a view can be be used to produce a heterogenous list of ALL documents that match a given word with a single query.

The other three features on this list are really operational advantages in a production environment. This one is CouchDB's killer developer advantage. Tough problems like analytics reporting and keyword searching are just easier to model in CouchDB than they are in MySQL because of CouchDB's more flexible indexes.

CouchDB Replication is Easy and Bidirectional

Getting replication going with MySQL is like jump-starting an old car with a manual transmission.

Getting replication going with CouchDB is like push-button remote starting a modern luxury vehicle.

As an added bonus, if it makes sense for your use case (it often doesn't), CouchDB supports true multi-master bidirectional replication. You'll still have to come up with a strategy for handling conflicts, but at least the possibility exists.

CouchDB's Killer Feature That Isn't

Because CouchDB is a NoSQL database, everyone assumes that it doesn't have schemas. While this is true in a literal sense--any document can have any JSON structure--it isn't true in an operational sense. CouchDB views are defined by design documents in the database. To change the behavior of a view, one must update the associated design document. When view behavior is changed via design document update, the entire view index is invalidated and must be rebuilt from scratch. Queries on that view will block until the index is rebuilt. In a database with a million records, rebuilding a view can take minutes.

Effectively, this behavior imposes a similar operational constraint on CouchDB that schemas impose on MySQL. If you must change your views, customers will experience downtime.

In my next post, I'll discuss some strategies for designing CouchDB views that won't need to be redefined later. The whole thing will essentially be a shameless plug for my node module, couchdb-auto-views.

comments powered by Disqus


JSON Minus Specification

4 Good Things About CouchDB


zero-setup views and powerful query composition

promise-based CouchDB library for node with no surprises (in a good way)

consume promises in fibers and fibers as promises

validate and prepare JSON Minus documents





External Links




Old Blog

© 2013 Will Conant