Breaking up Relationships with CouchDB

01 Mar 2013

[NB: This is an unpublished post I wrote in 2010 on getting started with CouchDB. Therefore, despite all the code and examples being relevant, it may be considered somewhat 'belated']

Beware – there’s a bunch of home wreckers out there intent on removing the love of your life and replacing it with a wicked mistress.

For me, my first experiments working with databases were performed with Oracle, the staple of our computer science course at the time. We were taught how to identify common structures in what we wanted to store (or had already been stored) and how to represent the relationships which existed between them. There was an entire series of lectures dedicated to this fine art, much of which I now can’t remember.

There has been an uptake in the number of developers working with NoSQL, or document oriented databases. These alternatives do not require decisions on subdividing documents into multiple record structures to be made at all, instead allowing the entire document to be recorded as a series of simple variable types. The contents of each document can vary from one document to the next. For the majority of developers who work with relational databases, this might come as a bit of a shock.

There are a number of varieties of NoSQL flavours currently available: CouchDB, Cassandra and MongoDB are a few of the hot ones right now. I’m not going to discuss the pros and cons of each right here (for that I’d refer you to The Changelog’s NoSQL smackdown podcasts), but rather give you a whistle stop tour of CouchDB, which I’ve been working with for a few years now.

If you want to follow along, you’ll need to head to the CouchDB site and read up on installation for your platform. I’m using the CouchDB server app, which is nice and self contained.

Document oriented storage gives me the ability not to have to worry about the structure of my documents prior to storage. From my own point of view this is a huge timesaver as my apps often tend to focus on a single type of document. Think of a blog post stored in a mysql database, we have a table for the post along with all its metadata, a table for the comments and maybe another for recording pingbacks. The same blog post in couch could be represented like so:

{
   "_id": "83ab09b88836ab714f592293d4e02845",
   "title": "My Blog Post",
   "content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
   "comments": [{"ian": "You're a wally"},{"Some user": "This post sucks"}]
}

A document based db may well not be suitable for the nature of your application. However, I’ve found in lots of cases, it is.

Rest Style Interface

CouchDB works across a REST style interface. To write documents back to couch, you call a HTTP PUT or POST with your JSON structure. To read documents, you call a GET and to delete them you call DELETE. For example, to add a document with “_id=someid” to the database “blog”, you would call the following:

curl http://127.0.0.1:5984/blog/someid -X PUT -d '{"Title":"Another Blog Post", "Content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit"}'

Similarly you can HTTP GET this uri to return the document just added. Saving the document to a couch store will add another field, “_rev” which represents the particular revision of the document you’re storing. Every time you update this document, it will be stored as a new document, along with a new revision number. You have access to all these revisions within couch, or you can choose to cleanup the database by compacting and removing all but the most recent revision of documents. Another blog entry document without any comments would be perfectly valid stored in the same database, as well as a document featuring just an array of pingback items.

You’ll notice that the structures are JSON objects. This is how all objects are exchanged between couch and other languages and over couch’s REST interface. Heading to where we stored the above blog document “http://127.0.0.1:5984/blog” in your browser yields a collection of metadata expressed in a JSON structure about that particular database. Appending “/83ab09b88836ab714f592293d4e02845″ (the id of the document) gives us the original blog document.

This, in turn works wonderfully well with certain frontend frameworks – such as Backbone. If you configure your backbone models to construct URLs the way CouchDB expects them, you can effectively have a frontend app driven by just a CouchDB store. As CouchDB returns JSON objects, it’s able to correctly parse and load them into your backbone app.

Map/Reduce

To support the extraction of substructures and sorting within these documents, couch has the concept of map/reduce built straight into its core. Futon, couch’s built in administration application, makes creation of these a very simple process. Maps and Reductions are described in Javascript, which is great given most of us are making more and more use of it these days. Head over to http://127.0.0.1:5984/_utils and you’ll see your collection of databases right away. Navigate to the blog database and under the view dropdown, select temporary view. Couch automatically populates this with a default temporary view:

function(doc) {
  emit(null, doc);
}

This basically reads as: for each document emit an object with a null key and the document as the value which is what you’ll see if you go ahead and run it. Not particularly exciting really, but by changing the null key to doc.title, we can emit all those blog posts sorted by title. To do something a little more complicated, such as determining the number of comments for each blog post we can make use of the reduce function too.

Map:

function(doc) {
  for(var i=0; i < doc.comments.length; i++){
   emit(doc.title, 1);
  }
}

Reduce:

function (keys, values){
   return sum(values);
}

Here, we cycle each of the comments for every blog post and emit a single value 1 for every comment. Our reduce function takes both the entire key and value set which are emitted from the map function for each document and calculates the sum of values sent to it. With these two very basic functions we can support all the queries we might want to make of the database. You can permanently store this entire map/reduce query as a view within couch to be called later. When a document is inserted which effects the result of the query, the map/reduce will be re-evaluated on the next call. It turns out that storing and querying documents in this way can be extremely performant.

I very much enjoy having the ability to write my queries in a language I’m familiar with to retrieve results, rather than having to dip into MySQL voodoo.

So, there you have it, CouchDB – I may well follow this up at a later date with a comparison with Mongo, given I’m now a certified developer.

Tagged backbone, couchdb, databases, nosql, | Leave a comment

Oh Crap, I'm a Frontend Developer

23 Nov 2012

When I was first tiptoeing in the waters of web development (early 2000′s), getting a grips on job descriptions was simple. There were 2 types of people in our industry: designers or developers. You either drew websites, or you built websites – that was it.

As time went on people seemed to invent new titles: “UX Designers” were the first to appear on the scene to me, and we’re still to this day figuring out what this role really entails. The one that particularly got my hair up though was a “Frontend Developer”. Basically, going from the skills that were listed with these roles (HTML, CSS, JS) it was a designer who could also do a bit of jQuery…. Not a real developer then. Not someone who pokes around with backend scripts and really knows what’s going on, right? Someone who can properly structure apps and create proper object oriented classes. It got my hair up because it seemed like an attempt to poach credentials from us hardworking devs. How dare they. Losers.

Time passed…. I got a real job….. The industry invented more job titles….

My first role was as a PHP/MySQL developer. I also started having to make decisions on frontend design and behaviour without having a formal education in either. I was, in my own projects at least, the only person to do such work. Outside of work during my first job, I decided to play with two hot technologies at the time: CouchDB and node.js. I geeked out about both, using document based stores and Javascript server-side – but hated the steep learning curve that came with having to use JS. It also missed many of the constructs I’d become used to in other languages.

I’d seen a great deal of realtime apps and knew behaviour in my own apps should be keeping up. What I had could be better, much better. It was about here I probably started appreciating how much work was involved in a typical browser based app. I started using backbone.js with a view to better organising the mess of JS that typically sat next to my markup. There wasn’t a great deal of help around, so I read a book as an aide….. I had to properly approach what was going on in the browser as I would do any other software. Hmm, weird. JS in the browser had up until now, been an afterthought to “jazz stuff up”. Developing real, well designed software in the browser however really floated my boat. I prototyped a simple calendar app and it worked fantastically well.

Since then, I’ve been lucky enough to move into a full time contract role as a Javascript Developer – or a Frontend Web Developer, if you like. I probably would have been shocked had someone suggested this 5 or so years ago.

I’m really enjoying being part of the community around JS, the ever increasing list of libraries being posted and patterns for development. Hopefully I’ve learnt to be a little more discerning of job titles than I have been in the past.

Tagged couchdb, js, nodejs, | Leave a comment

Pagination in CouchDB Apps

02 Nov 2011

I’ve been working on some fun little node.js / couchdb projects of late. Given the fact I don’t use either as part of my work, I’ve spent some downtime experimenting and slowly iterating my approaches as I learn best practice.

I hit what I consider to be a fairly frustrating hurdle that couchdb threw up that I’ve been blissfully unaware of through all my couchdb dev. When it came to doing pagination it turns out I’ve always been doing it the “bad” way. Oh, well that’s upsetting.

The Wrong Way

My “slow” approach has always been to take the page no as a argument in the url, generating “skip” and “limit” variables to be used as parameters to my store. So for example, if I wanted to have the 2nd page of my app showing 10 items:

  var skip = (pageno==1) ? () : ((pageno-1) * 10);
  curl -X GET http://127.0.0.1:5984/stuff/_design/stuff/_view/by-name?skip=10&limit=10

It turns out, that although you might think you’re starting at a particular result, CouchDB still starts at the first result, due to the way the view is created from the b-tree index, couchDB just surpresses the results you skip. This isn’t good news when you’re trying to skip say, 10000 results.

The Suggested Way

The suggested solution is to perform requests and instead of using a “skip” parameter, keep track of the startkey at which the next page begins. This is possible, by requesting a page 1 item longer than that of the number of items on a page and using the key of the result in any requests. So now, for a first page my query is:

  curl -X GET http://127.0.0.1:5984/stuff/_design/stuff/_view/by-name?limit=11

Returning something like:

 {"total_rows":17,"offset":0,"rows":[
  {"id":"8177bf155b952652129836a5d354b30e","key":"Ian Wootten","value":null},
  {"id":"bae2c490c70480aec7096d79e1e3bfc3","key":"Isambard Kingdom Brunel","value":null},
  {"id":"eaae74cfbe5cd13ea6b50dfd090827ca","key":"Christopher Columbus","value":null},
  {"id":"491e68b08d73256f060ebf4b8e063e1c","key":"Elizabeth Fry","value":null},
  {"id":"b45d8a7b9edee9ca66ac0860196f4504","key":"Edward Jenner","value":null},
  {"id":"8a4d3f46885701ffcc7532aeac7a5ae9","key":"Florence Nightingale","value":null},
  {"id":"71e6534c17429eca2cd9450cfc95c6bb","key":"Samuel Pepys","value":null},
  {"id":"6cbad847f0ae959b281b471a72d60587","key":"Pocahontas","value":null},
  {"id":"e5b026ec5c92c20f1575a2901defe14e","key":"Mary Seacole","value":null},
  {"id":"84a371a7b8414237fad1b6aaf68cd16a","key":"George Stephenson","value":null},
  {"id":"321aeb36e20d62660eb0d03c9fcd27b2","key":"Joe Bloggs","value":null}
]}

From the 11th returned result, I have the key “Joe Bloggs” – which can be used as a startkey arg to couch to obtain my second page. If we have duplicate keys, it is also neccessary to keep tabs on the last document’s id and supply as a startkey_docid arg in order to correctly page through everything.

What personally I dislike about the suggested approach, is the inability to create simple requests to arbritrary pages, even with low numbers. We always need to follow a path of links from the first page in order to view particular results. CouchDB’s response is “Not even Google is doing that!”, which is kind of weak to me. I want nice clean urls ala myapp.com/page/2 or myapp.com?page=2.

In fact, such a suggested approach only really allows us to have a single “more” type link in order to fetch results. Passing a startkey as part of a url param eg /page/321aeb36e20d62660eb0d03c9fcd27b2 just sounds (and looks) plain nasty and isn’t very good from a UX point of view for any users we may have.

At the moment, clean tangible page urls (the right way) are only possible using custom middleware. I’ve yet to find anything suitable for node.js. I intend to investigate how to cache document keys for low numbered pages as a separate db in order to produce a solution for my current project and I hope to write a later post detailing how I’ve got on.

Tagged articles, couchdb, nodejs, | Leave a comment

Blog rolling with CouchDB, Express and Node.js

07 Feb 2011

Over the last little while, I’ve been doing a lot of playing with Node.js, mostly to run data collection scripts. Last week, I started following Ciaran Jessup’s tutorial on getting started with node.js, Express and mongoDB. Express acts as a framework to node.js, allowing you to work in a familiar mvc format in a not so familar server side language. I hit a few problems along the way in the tutorial, so I thought I’ve list a few of my findings here. I also wanted to make use of my preferred flavour of nosql – couchdb with express, which proved extremely easy to port the mongo model over to it. I hope someone out there finds this useful as I’ve yet to find a vast community using couchdb/express.

First things first, you’ll want to install Node and npm (the node package manager) in order to be able to easily install node packages. You’ll also probably find it handy to have the original tutorial open alongside this one. I’ve been using the latest versions of node (0.3.7) and npm (0.2.17) at the time of writing.

Once that’s done, grab copies of the packages that we’ll be using:

npm install express
npm install jade
npm install sass

If you want to use couchdb, then make sure you have it installed – grab it over here and then install the package for talking to node.

npm install cradle

The first hurdle I found was that the way in which that express is called has changed a little.

var express = require('express');

var app = express.createServer();

app.configure(function(){
  app.use(express.methodOverride());
  app.use(express.bodyDecoder());
  app.use(express.logger());
});

app.get('/', function(req, res){
    res.send('Hello World');
});

app.listen(3000);

Then by saving to a file called app.js and calling using:

node app.js

Once that’s done. You can then visit 127.0.0.1:3000 in your browser to see a rivetting message!

After creating folder beneath our original app.js in which to put views, you can use the original article provider file and the updated app.js below in order to have an app with a few articles shown.

var express = require('express');

var ArticleProvider = require('./articleprovider-memory').ArticleProvider;

var app = express.createServer();

app.configure(function(){
  app.use(express.methodOverride());
  app.use(express.bodyDecoder());
  app.use(express.logger());
});

var articleProvider= new ArticleProvider();

app.get('/', function(req, res){
  articleProvider.findAll(function(error, docs){
    res.send(require('sys').inspect(docs));
  })
});

app.listen(3000);

Now when you re-run, you’ll see 3 separate articles. Oooh fancy!

Next I hit my first hurdle. Express no longer uses the HAML HTML template language and instead uses JADE by default. This requires converting the HAML templates across to their equivalent JADE counterparts. Basically, this is as simple as dropping the ‘%’ from the beginning of each line (I also replaced braces with brackets in later templates).

html
  head
    title= title
    link(rel: 'stylesheet', href: '/style.css' )
  body
    #wrapper
      != body
h1= title
#articles
  - each article in articles
    div.article
      div.created_at= article.created_at
      div.title= article.title
      div.body= article.body

The app.js now becomes:

var express = require('express');

var app = express.createServer();

app.configure(function(){
  app.set('view engine', 'jade');
  app.set('views', __dirname + '/views');
  app.use(express.methodOverride());
  app.use(express.bodyDecoder());
  app.use(express.logger());
  app.use(app.router);
  app.use(express.compiler({ src: __dirname + '/views', enable: ['sass'] }));
});

var ArticleProvider = require('./articleprovider-memory').ArticleProvider;

app.get('/', function(req,res){
  articleProvider.findAll(function(error, docs){
    res.render('blogs_index.jade', {
      locals: {
        title: 'Blog',
        articles: docs
      }
    });
  })
})

app.get('/*.css', function(req,res){
  res.render(req.params[0] + '.css.sass', { layout: false });
});

app.listen(3000);

You’ll notice that here we enable the CSS compiler sass and HTML compiler jade. If you download the original sass CSS template into your views folder, you can now restart the app and inspect the fruits of your labour. The CSS shouldn’t actually sit in the views folder, according to the creator of Express, and should instead should be compiled with the sass package itself. I’ve yet to discover the correct way of doing this. To request a stylesheet in the view, we need to do the following:

html
  head
    title= title
    link(rel: 'stylesheet', href: '/style.css' )
  body
    #wrapper
      != body

If you reload now and visit 127.0.0.1:3000 you should see your posts with a little more style.

Creating a form for new posts looks like this:

h1= title
form( method= 'post' )
  div
    div
      span Title :
      input(type='text', name= 'title', id= 'editArticleTitle' )
    div
      span Body :
      textarea( name= 'body', rows= 20, id= 'editArticleBody' )
    div#editArticleSubmit
      input( type= 'submit', value= 'Send' )

And the new app.js routes are as follows:

app.get('/blog/new', function(req,res){
  res.render('blog_new', {
    locals: {
      title: 'New Post'
    }
  });
});

app.post('/blog/new', function(req,res){
  articleProvider.save({
    title: req.param('title'),
    body: req.param('body')
  }, function(error, docs) {
    res.redirect('/')
  });
});

In order to add persistence using mongodb, nothing changes in the original model file, so go ahead and use that. You’ll need to have installed the package ‘mongodb’ if you’d like to try out using it though and update your instantiation of the articleprovider class by supplying a port number to which mongo is installed.

Adding CouchDB Persistence

Here, I took my own angle on the tutorial and decided to give attempting to make my own persistence model using couchdb a go. It proved to be extremely easy, given the JSON representation and HTTP/GET method of access already built in to it.

var cradle = require('cradle');

ArticleProvider = function(host, port) {
  this.connection= new (cradle.Connection)(host, port, {
    cache: true,
    raw: false
  });
  this.db = this.connection.database('articles');
};

ArticleProvider.prototype.findAll = function(callback) {
    this.db.view('articles/all',function(error, result) {
      if( error ){
        callback(error)
      }else{
        var docs = [];
        result.forEach(function (row){
          docs.push(row);
        });
        callback(null, docs);
      }
    });
};

ArticleProvider.prototype.findById = function(id, callback) {
    this.db.get(id, function(error, result) {
        if( error ) callback(error)
        else callback(null, result)
      });
};

ArticleProvider.prototype.save = function(articles, callback) {
    if( typeof(articles.length)=="undefined")
      articles = [articles];

    for( var i =0;i< articles.length;i++ ) {
      article = articles[i];
      article.created_at = new Date();
      if( article.comments === undefined ) article.comments = [];
      for(var j =0;j< article.comments.length; j++) {
        article.comments[j].created_at = new Date();
      }
    }

    this.db.save(articles, function(error, result) {
      if( error ) callback(error)
      else callback(null, articles);
    });
};

exports.ArticleProvider = ArticleProvider;

I added the following view and route to my app.js too, to allow support for clicking upon articles.

div.article
  h1= article.title
  div.created_at= article.created_at
  div.body= article.body
app.get('/blog/view/:id', function(req,res){
   articleProvider.findById(req.params.id,
     function(error, doc){
       res.render('blog_view', {
       locals: {
         title: 'New Post',
         article: doc
       }
     });
  });
});

And in order to support it, the main view now becomes as follows:

h1= title
#articles
  - each article in articles
    div.article
      div.created_at= article.created_at
      - var articlelink = 'blog/view/' + article._id;
      a(href=articlelink)
        div.title= article.title
      div.body= article.body

  a(href='blog/new')
    Add new post

Anyway, so far I’ve not yet added comment support, but given the headway I made here, I’d imagine it would be extremely easy to integrate into my couchdb article model. I’ll update here if I ever get round to adding it!

Tagged couchdb, expressjs, mongdb, node, nodejs, | Leave a comment