Jesper Høy

Posted on Feb 20 • Edited on Nov 15 • Originally published at jesperhoy.dk

Why I am moving to CouchDB

#couchdb #sqlserver #mongodb #database

Starting today, I will be using CouchDB as my primary database server for new projects and will gradually be moving existing projects to it.

I have been using Microsoft SQL Server for many years and never really gave it much thought - it just works - and I know how to make it work. And it is pretty fast given the right hardware and configured with the right indexes etc.

Late to the No-SQL party?

I know this sounds like swimming against the current - or being late to the No-SQL party. But this is really about the overall feature set of CouchDB specifically - and not an SQL vs No-SQL thing.

I don't have big data...

Many articles on why you might use No-SQL databases (like CouchDB) point to "big data" as the main reason. But I don't have "big data". My largest SQL database is around 5GB with a few million records total. So that's not it.

I don't need master-to-master replication...

As for CouchDB specifically, many articles point to master-to-master replication, filtered synchronization with client devices using PouchDB, or offline-first scenarios. Those are great features - but that's not it either.

So why CouchDB?

It is simple!

Other database systems certainly have a lot more features.
But for 99% of what I do - CouchDB is plenty.
And I'd much rather use a simple system that I can get my head around, than some complex thing that I will never fully understand or utilize.
Lightweight

In my tests with 500K records, it used less than 100MB of memory and otherwise hums along at below 40MB.

This makes it feasible to run it on a small server / VPS alongside other stuff - like your web-server.
Live continuous replication/backup

Setting up live continuous replication/backup to another CouchDB instance is super easy.
Having an always up-to-date replica beats daily SQL server backups by a long shot (virtually no risk of data loss vs. risk of losing up to 24 hours' worth of data).

You can even do filtered replication - providing different clients different parts of the database - something that is impossible with most other database systems.
Data is just JSON

JSON is THE universal data language.

We have to use it when doing any kind of web front-end stuff, so we might as well use it with the database too.

This also makes it possible to re-use a lot of serialization code.
HTTP REST interface

No client libraries needed.

Makes it universally and easily accessible - from code, command line, HTTP tools, etc.

For example, using .http files in Visual Studio or the REST Client extension in VS Code.
Web-based admin panel

CouchDB comes with a built-in web-based admin panel called "Fauxton" for basic server administration.

It is easy to also install Photon (a much nicer admin panel alternative), which additionally lets you do ad hoc SQL queries, diff document revisions, perform backup/restore, and much more - all through a browser from anywhere.
Map/Reduce

In addition to traditional indexes (fixed fields, sort-order, partial-condition), called "Mango indexes" in CouchDB, CouchDB also has a very powerful Map/Reduce feature.

Map/Reduce lets you define keys and conditions through JavaScript functions (the "map" part of map/reduce) giving you a lot more flexibility.

To sum (or do other calculations on) the values in a column for a set of rows, with SQL, the server needs to go through every one of those rows every time you execute the query. With CouchDB, the server can do most of this work ahead of time along with the indexing (the "reduce" part of map/reduce), making such queries much more efficient. And again, because this is defined through JavaScript functions, you get a lot more flexibility.

Map/Reduce is also what lets you put invoice-lines in the same document (record) as the rest of the invoice, while still being able to index and query those lines individually.
No schema

With SQL, I have to constantly tweak the database tables and columns as I write my code.
And I have to "normalize" my data (= structure it in a weird way - like putting invoice-lines in a separate table).
The schema-less nature of CouchDB makes life so much easier in this regard.

My data is already validated and structured by the application (like server-side code on web-sites), so repeating this in the database is just extra work.
No cloud vendor lock-in

Because CouchDB is software that runs on your own hardware (or VM / container), you can easily move your data to another CouchDB instance running anywhere - on-premise or at any cloud provider - using the built-in replication feature.

You cannot do this with Amazon DynamoDB, Microsoft Azure Cosmos DB, or Google Firestore / Bigtable.
Free / open-source / Apache

Who doesn't like free? Being under the Apache Foundation means that it will stick around for a while.

What about ad-hoc queries?

One advantage of SQL databases is that you can always write a quick SQL query to get out your data filtered, and sorted just the way you want - without regard to indexes (it may take a while but it will run).

With Photon (see above), you can actually do the same with CouchDB - query it using basic SQL statements.

CouchDB also has its own JSON-based query language called "Mango" (which is what Couch Photon SQL queries use behind the scenes).

You are NOT forced to create map/reduce views for all queries - like some internet posts would have you believe.

What about joins?

You won't need joins as much with CouchDB because of "de-normalization" (invoice-lines in same document as invoice itself).

But you can have linked documents which is a simple way of doing 1st level joins.

What about the horror stories of slow re-indexing when changing design documents?

I read them too. So I ran some tests.

I first created a CouchDB database with 500,000 documents (real-life data copied from a SQL database), then created a new design document / view, then did a query against it, then waited... for about 1 minute, and then it was all indexed and responding crazy fast. And this was with CouchDB running on my old laptop!

This does not scare me.

What about rumors that CouchDB is a disk hog?

I read them too. So I ran some tests.

Compared to SQL Server, it does use almost twice as much disk space for data (after compression / compacting).
This is understandable given that "column names" are stored in every "row".

And it seems that you need to schedule compression on a regular basis to prevent data files from "exploding".

For me, this is an acceptable tradeoff.

One of the original selling points of SQL and "normalization" was preservation of disk space - because disks used to be slow and expensive. Neither is the case today.

What about MongoDB?

I did of course also look at MongoDB - the more popular choice amongst No-SQL databases and similar to CouchDB in concept.

From what I gather, MongoDB has better performance and tooling, but it is also more complex and not as lightweight (it uses much more memory).

MongoDB uses a proprietary client access protocol and thus needs proprietary client libraries/drivers (vs just using HTTP REST).

It uses BSON rather than JSON.

MongoDB is a 572MB download - CouchDB is 66MB.

From what I have seen, CouchDB is plenty fast, and I do prefer simple and lightweight over complex and heavy :-)

Conclusion

I have been "playing" with CouchDB on a few side projects, and the experience has just been fantastic so far.

I am now ready to go all-in :-)

Top comments (3)

Ko • Feb 20 • Edited

I have been using CouchDB for several years for my project ooko.pro. I was surprised that there is a database that allows you to use javascript for queries. This greatly helped simplify development. I only use "view".
The downside is that there is no transaction, you have to plan the order of changes. “_id, _rev, …” is very annoying, I started replacing it on the fly with “id” and “rev”. Low popularity compared to other databases

Jesper Høy • Feb 20

Thank you your comment :-)
As you mention, CouchDB is not the most popular - so I am glad to hear that I am not the only one here using it :-)

As for no transactions - I am hoping that I can work around that mostly through "de-normalization" - like putting the invoice-lines into the same document as the invoice itself. This way I don't need a database transaction when saving the invoice - since it is only one document / record.
Am I oversimplifying this?

Ko • Feb 20

Yes, this approach helps. This also makes it possible to get everything in one request. The downside is that the size of the document increases (noticeably if you need to receive a lot of documents at once)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community