Breaking up Relationships with CouchDB

[NB: This is an unpublished post I wrote in 2010 on getting started with CouchDB. Therefore, despite all the code and examples being relevant, it may be considered somewhat ‘belated’]

Beware – there’s a bunch of home wreckers out there intent on removing the love of your life and replacing it with a wicked mistress.

For me, my first experiments working with databases were performed with Oracle, the staple of our computer science course at the time. We were taught how to identify common structures in what we wanted to store (or had already been stored) and how to represent the relationships which existed between them. There was an entire series of lectures dedicated to this fine art, much of which I now can’t remember.

There has been an uptake in the number of developers working with NoSQL, or document oriented databases. These alternatives do not require decisions on subdividing documents into multiple record structures to be made at all, instead allowing the entire document to be recorded as a series of simple variable types. The contents of each document can vary from one document to the next. For the majority of developers who work with relational databases, this might come as a bit of a shock.

There are a number of varieties of NoSQL flavours currently available: CouchDB, Cassandra and MongoDB are a few of the hot ones right now. I’m not going to discuss the pros and cons of each right here (for that I’d refer you to The Changelog’s NoSQL smackdown podcasts), but rather give you a whistle stop tour of CouchDB, which I’ve been working with for a few years now.

If you want to follow along, you’ll need to head to the CouchDB site and read up on installation for your platform. I’m using the CouchDB server app, which is nice and self contained.

Document oriented storage gives me the ability not to have to worry about the structure of my documents prior to storage. From my own point of view this is a huge timesaver as my apps often tend to focus on a single type of document. Think of a blog post stored in a mysql database, we have a table for the post along with all its metadata, a table for the comments and maybe another for recording pingbacks. The same blog post in couch could be represented like so:

{
   "_id": "83ab09b88836ab714f592293d4e02845",
   "title": "My Blog Post",
   "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
   "comments": [{"ian": "You're a wally"},{"Some user": "This post sucks"}]
}

A document based db may well not be suitable for the nature of your application. However, I’ve found in lots of cases, it is.

Rest Style Interface

CouchDB works across a REST style interface. To write documents back to couch, you call a HTTP PUT or POST with your JSON structure. To read documents, you call a GET and to delete them you call DELETE. For example, to add a document with “_id=someid” to the database “blog”, you would call the following:

curl http://127.0.0.1:5984/blog/someid -X PUT -d '{"Title":"Another Blog Post", "Content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit"}'

Similarly you can HTTP GET this uri to return the document just added. Saving the document to a couch store will add another field, “_rev” which represents the particular revision of the document you’re storing. Every time you update this document, it will be stored as a new document, along with a new revision number. You have access to all these revisions within couch, or you can choose to cleanup the database by compacting and removing all but the most recent revision of documents. Another blog entry document without any comments would be perfectly valid stored in the same database, as well as a document featuring just an array of pingback items.

You’ll notice that the structures are JSON objects. This is how all objects are exchanged between couch and other languages and over couch’s REST interface. Heading to where we stored the above blog document “http://127.0.0.1:5984/blog” in your browser yields a collection of metadata expressed in a JSON structure about that particular database. Appending “/83ab09b88836ab714f592293d4e02845″ (the id of the document) gives us the original blog document.

This, in turn works wonderfully well with certain frontend frameworks – such as Backbone. If you configure your backbone models to construct URLs the way CouchDB expects them, you can effectively have a frontend app driven by just a CouchDB store. As CouchDB returns JSON objects, it’s able to correctly parse and load them into your backbone app.

Map/Reduce

To support the extraction of substructures and sorting within these documents, couch has the concept of map/reduce built straight into its core. Futon, couch’s built in administration application, makes creation of these a very simple process. Maps and Reductions are described in Javascript, which is great given most of us are making more and more use of it these days. Head over to http://127.0.0.1:5984/_utils and you’ll see your collection of databases right away. Navigate to the blog database and under the view dropdown, select temporary view. Couch automatically populates this with a default temporary view:

function(doc) {
  emit(null, doc);
}

This basically reads as: for each document emit an object with a null key and the document as the value which is what you’ll see if you go ahead and run it. Not particularly exciting really, but by changing the null key to doc.title, we can emit all those blog posts sorted by title. To do something a little more complicated, such as determining the number of comments for each blog post we can make use of the reduce function too.

Map:

function(doc) {
  for(var i=0; i &lt; doc.comments.length; i++){
   emit(doc.title, 1);
  }
}

Reduce:

function (keys, values){
   return sum(values);
}

Here, we cycle each of the comments for every blog post and emit a single value 1 for every comment. Our reduce function takes both the entire key and value set which are emitted from the map function for each document and calculates the sum of values sent to it. With these two very basic functions we can support all the queries we might want to make of the database. You can permanently store this entire map/reduce query as a view within couch to be called later. When a document is inserted which effects the result of the query, the map/reduce will be re-evaluated on the next call. It turns out that storing and querying documents in this way can be extremely performant.

I very much enjoy having the ability to write my queries in a language I’m familiar with to retrieve results, rather than having to dip into MySQL voodoo.

So, there you have it, CouchDB – I may well follow this up at a later date with a comparison with Mongo, given I’m now a certified developer.