I recently read some post about yet another NoSQL solution. This reminded me of the project which was nearly killed (I left before it have died) - and I want to share the story with some analysis of how it happened - so that it may help others avoid such silly trap :)
It is about real commercial project for some large company which publishes music. The project was a web-application which helps company clients - e.g. musicians, song-writers, singers, or groups of them - to arrange their deals with company. So it had accounts with various data, recording sales, purchases and amounts of money to be paid etc.
Front-end was using several modern JS frameworks and worked as beautiful SPA, getting data in REST/JSON from backend.
Backend was in Java and used NoSQL database for storage. Or rather not single database, but several solutions.
When I was interviewed for this project, I asked "why using Cassandra? Do you have some bigdata processing?" They answered "No, but modern projects are often built with NoSQL".
Right. It was in 2014 - the peak of the hype about NoSQL.
Database needed just typical operations:
- storing users
- storing their products (from songs to t-shirts)
- storing their transactions (what is sold or bought etc)
- making various reports by calculating, joining, aggregating etc.
All this could be easily done by normal SQL database - like MySQL, Postgres etc. The main pros for NoSQL are usually either better handling of "multi-node" mode (SQL dbs usually offer sharding and replication only) - and sometimes easier approach for describing data (e.g. "schema-less").
Initially folks were using Cassandra. It is interesting "columnar" database. When you just start it, you won't see much difference at first. You can create tables, insert records here.
However soon you notice some problems:
- Cassandra is great for storing data, but not for extracting them - you can only fetch data by indexed keys and even this is not very efficient;
- searching with complex queries, using joins etc - is almost impossible.
To solve these limitations guys used two more storages over database:
- dedicated cache (Hazelcast) which allowed speed-up retrieval of recently stored or viewed records;
- search engine (Elastic Search) to allow extracting data in various ways.
Stupid thing is that any of this was a storage on its own. So really guys were using 3 databases to store the same data instead of one.
Of course at some point people become very tired of this.
So our architects started thinking of another database engine. Of course also NoSQL. Because of hype. By then one of most promising was OrientDB.
It is graph DB which also may look as document-oriented storage. It boasted "multi-master" mode, almost unheard of in other databases by then.
And by rough experiments it worked far better and easier than solution with Cassandra. It even allowed some kinds of joins etc.
So we spent efforts of 2-3 people for about half-year to rewrite or generalize various programming interfaces - in order that application can be switched from Cassandra to Orient one day.
Regretfully this failed miserably. It appeared that by then OrientDB had several bugs and one of them quite nasty. When we update records we usually "lock" them so that other users of DB see everything in consistent way and can't partially update the same record at the same time.
And the bug was - sometimes OrientDB didn't unlock some records after operation. This only happened in "multi-master" mode. While we were developing in single-master, everything worked well.
The bug was acknowledged by developers and it was told it is going to be fixed in Orient version 2.0, but we were not ready to update code once more - and anyway it was not fixed in 2.0 preview available by then.
- I won't say NoSQL is bad. But it is important to understand - they are different things and often for different goals. And we should remember all NoSQL databases are different from each other, and may require study and investigation.
- Of course we should try new databases, new solutions sometimes. Developers should progress in their knowledge.
- However, changing our main "business-database" of the application from SQL to some NoSQL solution will most probably be very painful and with unpredictable results.
- Better strategy is to use SQL and NoSQL along. If you see that some tables would be nice to be moved to NoSQL (e.g. due to their size) - let's try! It is always easier to migrate single table back if something went wrong. It's just as keeping static content on different server for web-site.
- Bicycle with square wheels, shown above - it is really good for some kind of non-flat road. This is well known from geometry. But we shouldn't try using it everywhere. It is the same with NoSQL. :)
Don't hang on hype! Be wise and cautious - and you'll never get unhappy because of NoSQL solutions!