DEV Community

loading...

How we killed project with NoSQL

rodiongork profile image Rodion Gorkovenko ・4 min read

I recently read some post about yet another NoSQL solution. This reminded me of the project which was nearly killed (I left before it have died) - and I want to share the story with some analysis of how it happened - so that it may help others avoid such silly trap :)

Alt Text

It is about real commercial project for some large company which publishes music. The project was a web-application which helps company clients - e.g. musicians, song-writers, singers, or groups of them - to arrange their deals with company. So it had accounts with various data, recording sales, purchases and amounts of money to be paid etc.

Front-end was using several modern JS frameworks and worked as beautiful SPA, getting data in REST/JSON from backend.

Backend was in Java and used NoSQL database for storage. Or rather not single database, but several solutions.

When I was interviewed for this project, I asked "why using Cassandra? Do you have some bigdata processing?" They answered "No, but modern projects are often built with NoSQL".

Right. It was in 2014 - the peak of the hype about NoSQL.

Database needed just typical operations:

  • storing users
  • storing their products (from songs to t-shirts)
  • storing their transactions (what is sold or bought etc)
  • making various reports by calculating, joining, aggregating etc.

All this could be easily done by normal SQL database - like MySQL, Postgres etc. The main pros for NoSQL are usually either better handling of "multi-node" mode (SQL dbs usually offer sharding and replication only) - and sometimes easier approach for describing data (e.g. "schema-less").

Cassandra solution

Initially folks were using Cassandra. It is interesting "columnar" database. When you just start it, you won't see much difference at first. You can create tables, insert records here.

However soon you notice some problems:

  • Cassandra is great for storing data, but not for extracting them - you can only fetch data by indexed keys and even this is not very efficient;
  • searching with complex queries, using joins etc - is almost impossible.

To solve these limitations guys used two more storages over database:

  • dedicated cache (Hazelcast) which allowed speed-up retrieval of recently stored or viewed records;
  • search engine (Elastic Search) to allow extracting data in various ways.

Stupid thing is that any of this was a storage on its own. So really guys were using 3 databases to store the same data instead of one.

Of course at some point people become very tired of this.

OrientDB attempt

So our architects started thinking of another database engine. Of course also NoSQL. Because of hype. By then one of most promising was OrientDB.

It is graph DB which also may look as document-oriented storage. It boasted "multi-master" mode, almost unheard of in other databases by then.

And by rough experiments it worked far better and easier than solution with Cassandra. It even allowed some kinds of joins etc.

So we spent efforts of 2-3 people for about half-year to rewrite or generalize various programming interfaces - in order that application can be switched from Cassandra to Orient one day.

Regretfully this failed miserably. It appeared that by then OrientDB had several bugs and one of them quite nasty. When we update records we usually "lock" them so that other users of DB see everything in consistent way and can't partially update the same record at the same time.

And the bug was - sometimes OrientDB didn't unlock some records after operation. This only happened in "multi-master" mode. While we were developing in single-master, everything worked well.

The bug was acknowledged by developers and it was told it is going to be fixed in Orient version 2.0, but we were not ready to update code once more - and anyway it was not fixed in 2.0 preview available by then.

Conclusion

  1. I won't say NoSQL is bad. But it is important to understand - they are different things and often for different goals. And we should remember all NoSQL databases are different from each other, and may require study and investigation.
  2. Of course we should try new databases, new solutions sometimes. Developers should progress in their knowledge.
  3. However, changing our main "business-database" of the application from SQL to some NoSQL solution will most probably be very painful and with unpredictable results.
  4. Better strategy is to use SQL and NoSQL along. If you see that some tables would be nice to be moved to NoSQL (e.g. due to their size) - let's try! It is always easier to migrate single table back if something went wrong. It's just as keeping static content on different server for web-site.
  5. Bicycle with square wheels, shown above - it is really good for some kind of non-flat road. This is well known from geometry. But we shouldn't try using it everywhere. It is the same with NoSQL. :)

Don't hang on hype! Be wise and cautious - and you'll never get unhappy because of NoSQL solutions!

Discussion (19)

pic
Editor guide
Collapse
akashicseer profile image
akashic seer

Most developers have zero clue what they are talking about. Many hear trendy things and want to jump on it instead of using the right tool for the job. I found Designing Data-Intensive applications by Martin Kleppmann to be one of the most valuable books I have read and reread.

It covers everything you could ever want to know about Databases and storing data and will help engineers understand which solution works best for each problem.

Cassandra is being used in Microservices as a way to store events before pushing them to an event broker and in other simple ways. But it is not good to use it as you would an SQL database like MySql. The book I mentioned above majorly helped me understand what is best for what.

I am building a very complex unique social networking platform right now based on Microservices so I have to figure out what pieces of the puzzle I need for what. Microservices are very hard to do right.

Here is a list of microservice resources I have composed if anyone is intersted.

Collapse
rodiongork profile image
Rodion Gorkovenko Author

Thanks a lot for the link to book and to your list of resources!

As a side note to this:

Microservices are very hard to do right

While microservice architecture ceased to be novelty for some years already. I rarely found project nowadays which doesn't utilize this approach. However You are very right, it seems to me. But not only because of themselves. Lot of troubles comes from fact that projects get new requirements and features over time. Monolithic architecture suffers from such extensions. Microservices also suffer from these extensions. With microservices it is just somewhat easier to do housekeeping, partial updates etc. But still no way to do things "very right - and from beginning" because at beginning we don't know what exactly would be "right" some time later. (my meek personal opinion)

Collapse
akashicseer profile image
akashic seer

Also I am building an app that creates contests and accepts crypto currency. I am going to use it to launch the social platform by having it run for months to gain future users. The app will give crypto tokens to those who get the most invites to sign up, those who donate the most bitcoin/ethereum etc.

That app is going to be a monolith to save money on hosting, plus it is a short term app that will not be used after launch.

Collapse
akashicseer profile image
akashic seer

That is one of the most valuable books I have found. I had no idea it was going to be about databases. The guy really, really digs into the heart and soul of databases. He takes them apart and shows you how they work underneath.

Yeah starting with a microservice architecture is not right for everyone. But I have already created a monolithic style Social Platform and I will never in my life make such a horrible mistake. Any tiny change becomes a complete nightmare. Change the Login and Registration fails. Fix registration and something goes wrong in another place. Not only that but adding features is hard because you have to retest the entire code base each time.

Microservices work good if you know what you are building like I do. You have to know the domain first or have a working app to go from. The costs associated with microservices are falling depending on how you structure them. I've done 2 years of research on architecture alone after I came up with the app design.

One thing I like about microservices is the separation and autonomy. I like the idea of launching an app when it is like 75% complete then updating it along the way. Maybe you can start with Profiles and then add support for pages or groups later etc. Also if at some point I decide I want to switch languages, the integration process will be much easier. Also I plan on using Scala for most of my Microservices, however some such as image/video processing may be better with Python or something else. I may need some node since my front has some special use case Javascript that the back must also process, I may use Node for that. The biggest draw is separation and autonomy. I created a big turd of a monolith in the past and it left me with major anal pains. With a monolith you are locked in for life and death.

Many apps may not benefit from Microservices, but many can. Tools like Docker, Kubernetes, GitLab and Openshift make them easier. After many years of Monolith I wanted something different.

Thread Thread
rodiongork profile image
Rodion Gorkovenko Author

Also I plan on using Scala for most of my Microservices

Honestly, I'm not sure there is sensible benefit in this. Scala definitely failed to become "the language of the future". It will surely work, but considering memory consumption and the language being overcomplicated with unclear ideas of meeting Scala 3 - I personally won't be glad to undermine project from beginning.

Perhaps I should add post "how we killed the project with Scala" :) but it was recommender, which is somewhat different.

One thing I like about microservices is the separation and autonomy

Which however sometimes can end up in great entanglement of dependencies between microservices, technically losing both separation and autonomy. But still microservice organization allows to keep order longer. :)

After many years of Monolith I wanted something different.

I usually can't draw hard line between monolith and microservices in real projects. Former monolith ones often get some satellite services and become more like set of microservices. On other hand in microservices structure often one or few become larger over years and resemble monolitic microservice :)

Thread Thread
akashicseer profile image
akashic seer

yeah the microliths. The thing is the app I am building is very heavy in media processing. This is one of the other main reasons I wanted to go with the microservice over monolith. I don't want to have to have super duper giant servers to host my app 100 times behind a load balancer when the media processing is taking 85% of the resources.

It takes a lot more planning and understanding to get Microservices correct. That is why I have been studying them for 2 years. There is no one right way. I like the ability to add and replace microservices by each one having it's own devops pipeline. The decision was not a simple one by far, but the alternative was to do something I have already have fail. I think most fail at microservices because they don't understand them correctly.

I have considered starting with a monolith built in a modular fashion, however there is so much media processing. Each post for example will need to process text, hashtags, mentions, emoticons, likes/dislikes, images or video and some special code unique to the platform. Compared to the other things the app does that is about the most resource dependent. I am not choosing microservices because they sounded cool.

The choice adds a lot to the cost of launching and running it. Each Microservice is a container. All orchestrated with Openshift on AWS running x amount of services per instance with multiple instances. The alternative is to run a monolith on larger and larger instances to scale it if the app takes off.

It has been really hard to wrap my head around the proper concepts mostly because of all of the old and bad advice. Like using RPC between all services for communication.

What you say about Scala and the JVM is true, but both have better frameworks and support for concurrent programming than other languages I could find. AKKA and it's actor system got my attention along with Spark and Kafka. And the changes coming to Scala could be a pain, but pretty much every language does this. PERL did it. Java Did it. With microservices I can update 1 service at a time to Scala 3.

The other languages I have considered were Golang, Rust and PHP. Php just compiles down to C and is pretty decent on resources. I think Golang had some support for concurrent programming built in. I don't think PHP does yet. And I didn't investigate Rust too far.

One thing I like about Scala is the type system and testing tools. The amount of code you have to write with scala is so little too.

I have thought of building the entire social platform as a modular monolith so that scaling in the future would be easier by breaking out the microservices. This lessens the cost of hosting and makes launching easier. Plus it doesn't hurt that I understand all of the concepts ahead of time and plan for change in the future. A monolith should handle a few million users from my research.

All JVM languages are memory hogs because of the auto garbage collection FEATURE<- so we can be lazy and eat all the memory like cookie monster nom nom's LOL

Also if this venture fails I need some new skills to fall on. There are way too many PHP, Javascript and Java programmers. It is too hard to find work in those fields especially in the freelance world, which is why I have moved away from PHP and Javascript.

Thanks for your responses. I am open to all input because I like to learn from other peoples experiences.

Collapse
lukefeeney profile image
Luke Feeney

Good article and sage advice. I'm interested in the experience with Orient (as I am currently working with an open source graph db project) - it sounds very difficult. I'm of the view that a scalable strong schema graph db offers all the advantages of RDBMS and some extra, but it's still early and there is significant hardening needed, so proceed with caution (especially on the transaction workload).

Collapse
rodiongork profile image
Rodion Gorkovenko Author

that scalable strong schema graph db offers all the advantages of RDBMS

I'm afraid this is not quite correct. Graph DBs give some specific advantage - "graph" queries. It is hard or impossible to do with normal SQL (popular interview question about some Oracle feature I think).

But there are other types of queries (this is mostly about certain joins) which are really hard to simulate on graph database. We made some simple substitution for some of them and did "manual join" (extracting data and doing this in code) for others.

Regretfully I'm not really sage about Orient - as you probably noticed this situation was over 5 years ago. Orient had turbulent times since then - but recently it somewhat rectified.

As about "schema-less" - note that nowadays RDBMS can store JSON data and operate on their fields. Which really makes them good even without schema :)

Collapse
akashicseer profile image
akashic seer

Yes I plan on using a Graph DB to store certain information about things users like, dislike interact with etc. to run queries against for things like suggested content, suggested followers etc.

Many years ago I did the same with MySQL. Joins got to be such a pain across like 6 to 10+ tables. I ended up writing a special class just to create the Join Queries. Well then the problem is that many joins gets slow and eats resources. Changes to table structures become harder too.

I am considering Neo4J but need to do more research into Graph Db's got any good information or links about any?

Thread Thread
lukefeeney profile image
Luke Feeney

Yes - I know a very cool in-memory open source one called TerminusDB. Like Neo, but more performant (thou I haven't really checked out the new Neo 4.0 release).

Collapse
lukefeeney profile image
Luke Feeney

Interesting that you found certain types of queries difficult to run on the graph model. I've not really been exposed to anything specific like that.

Thread Thread
rodiongork profile image
Rodion Gorkovenko Author

I think I'd better try to come up with some good example instead of bewildering people with vague statements :) As this may take time I'll probably do this in separate post and drop a link here when I have it.

Thanks for this point!

Thread Thread
rodiongork profile image
Rodion Gorkovenko Author • Edited

Luke, here is my first attempt!

dev.to/rodiongork/rewrite-this-sql...

BTW great hairstyle. I feel envious!

Thread Thread
lukefeeney profile image
Luke Feeney

Will take a look.

It is the vulcan blood that allows the hair to grow so thick and green!

Collapse
miniscruff profile image
miniscruff

We use Cassandra at work and one of my chat bots is missing half it's features cause I have no clue how to do then in NOSql, but would of taken 5 minutes in SQL. My boss insists there is a solution but hasn't found one yet. Using only NOSql for everything has been a struggle for me at least.

Collapse
rodiongork profile image
Rodion Gorkovenko Author

Cassandra is great for certain tasks, like pushing tons of data into it very fast, performing bulk processing etc. It is used as low-level storage in some other special databases (e.g. time-series databases)...

Regretfully, exactly as you said, some operations are painful or impossible with it...

Collapse
miniscruff profile image
miniscruff

Exactly, I can not stress enough how hard it is to use Cassandra, and by extension probably any non-relational, database for data that is very relational...

Collapse
akashicseer profile image
akashic seer

The solution is to write a migration script and pull that crap out of Cassandra and move it to another SQL system like MySQL.

Collapse
alexeyzimarev profile image
Alexey Zimarev

I personally don't really like the term NoSQL. What does it mean? "Everything else but SQL"? What is SQL then, anyway? Can we say that KSQL, Elasticserch SQL or Hive are SQL?

Instead, we can clearly classify database engines by their breed. Like, Cassandra is a key-value store, as well as Redis and DynamoDB. MongoDB is a document database. Elasticsearch is a document-oriented search database. MariaDB or PostgreSQL are RDBMSes.

After we have done this simple and way more precise classification, we can find out if the tool suits the job. Do you need queries? You'll have a hard time querying key-value stores. But it's very easy to query MongoDB, but you have to pay attention to your document schema. If you used ORMs before, you probably would be quite happy using MongoDB, unless you overuse relations between tables. Do you need to index and search a massive set of documents with somewhat loose schema? Then, you can try Elasticsearch or maybe even be happy with MongoDB full-text search, but no key-value store will ever provide you with such a capability.

You can still combine the high transaction throughput of Cassandra with a queryable model by making change-feed processing, projecting data from Cassandra to another database that has better support for queries. In that case, you will have to deal with some eventual consistency but it might be a tradeoff that is acceptable.

My point here is: please don't fall to a trap of "SQL vs NoSQL" discussion. These terms are vague and bias-prone. Sometimes saying "I could do it with SQL in 5 minutes" mean "I know how to do it with PostgreSQL in five minutes, maybe using Oracle will take me an hour, and with MongoDB, it will take 30 minutes but it will be easier to maintain. And there's no way I can use Redis". Such a reasoning has much more value.