Saying the above out loud is almost like publicly admitting you've got lepra today, because all the hype seems to be geared towards NoSQL and docum...
The discussion has been locked. New comments can't be added.
Because the posting started attracting people only wanting to enflame the debate, and the debate was no longer productive
For further actions, you may consider blocking this person and/or reporting abuse
Rick Houlihan (then at AWS):
"One of the things I hear a lot: ‘Use NoSQL because it’s very flexible’. I’ve done a thousand NoSQL applications, I can tell you nothing could be further from the truth.
NoSQL is not a flexible database, it’s an efficient database. The data model is very much not flexible [because it's tightly coupled to the access patterns of the application]."
AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
The funniest I've heard is "we don't need a schema", which of course ends up having to enforce a schema in the application code instead ... :/
Believing you can just "dump" your documents into a NoSQL database is a very rude awakening once you realise what you've done I suspect ... :/
Honestly, there is a right tool for the job for every problem/solution set. You do bring up some valid points here and there which would be factored in on deciding which to choose given the problem at hand. However, a few I disagree on one that stood out "Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time." You are living in a Object oriented world, have you ever had a project where requirements or the sow changes? Reorganizing and restructing a sql table imo is far more involved typically (especially legacy ones.) Wheras just popping in a new property in a document requires no overhead. Not to mention, with nosql a team can build an entire frontend application safely knowing all their objects will eventually 'fit' into your db. Vs sql decoupled development becomes challenging imo. Finally, the thing sql had that nosql doesn't is 'technology' lockin today you may be using mongodb tomorrow firestore, the next day? A git based file system ..it really doesn't matter you can hold a json style object anywhere and everywhere.
Not to mention, and don't want to generalize because it's not 'always' true just normally true. Deploying multi Tennant or isolated data architecture is a significantly more involved process when dealing with your traditional sql db vs nosql.
You’re preaching for the choir mate. We’ve got a multi location K8s cluster, basically the definition of multi tenant, with hundreds of separate clients, running in the same cluster, oblivious of each other’s existence … 😉
Great points, and I agree with most of them. However the idea that you can “dump stuff into your db”, is fundamentally wrong, because you’ve got code. You’re just ending up with an “implicit schema” instead of an “explicit schema”. As to me living in OOP land, read older articles in this series 😉
Disclaimer: my point of view will only concern MongoDB. I'm not aware of all of the specificities of other solutions.
I worked exclusively with MongoDB for the last two years and it's a real pain. As you said 99.99% of the time you don't need MongoDB unless you need a big performance boost for specific microservices.
SQL has been proved to be well designed for several decades with a ton of good features and it will always surpass NoSQL in terms of flexibility.
Yes, MongoDB is simple and fast, but it is fast not because it is "well written" but because it's not baked with ton of features like schema validation, transactions, foreign keys, etc.. (but it seems that last versions of MongoDB begin to integrate those concepts.. which is really weird to me). Since those features aren't in MongoDB you NEED to write a whole bunch of defensive code and you need to write it WELL. The task is not easy.
Moreover, as pointed out in this post, there are plenty of queries you can't do in one call. Even with aggregation. And often you'll need to write JS to sort, filter or transform your data in the way you need it.
Unless you don't need schemas and data coherence (like stocking logs or really simple data), my best advice is : don't use it.
I only have one comment for you; Faith in humanity restored 😉
I worked with CosmosDB for 6 months. If one of my team members as much as mumbled “let’s use Cosmos” today, I’d throw the person out the window!
yes u can count records in nosql and u can do lots of other things too but it is much more of a pain in the butt than just using sql
first fetch a json file (which is already a nosql format)
then do this
var len = Object.keys(myJsonFile).length;
console.log(len);
Word!
Awesome, then we agree 100% - My problem is that I see a lot of people choosing NoSQL because it's "cool" ...
You cannot do this, not even in theory, with a db that doesn't have a strongly typed schema. You could of course just return "whatever you've got" as documents in your db to the caller, but that would potentially risk returning stuff such as credit card information, and other sensitive data to unauthorised callers ...
As to CRUD an NoSQL; Yet again, you cannot do this, not even in theory. Generating a CRUD API by the very definition of the term, implies having access to meta data, unless you want a CRUD API that says; "Create, Read, Update and Delete 'garbage'".
Meta data in these regards translates to "a schema". NoSQL databases do not have a schema, and in fact this is a large part of their unique selling point.
To understand the dilemma, check this app, and ask yourselves; "How can I implement this CRUD app when I have no schema?"
The database schema is the formal specification that the above app was generated around. No schema, no meta data - No meta data, no specification - No specification, no app ... :/
Of course, if you've got a NoSQL database with a schema, the argument becomes different ...
However, the last time I checked, neither Cosmos, Mongo, CouchBase or Dynamo had schemas ...
I think that SQL or NOSQL are both good, it depends the project and the scope of the solution, for example if you have a changing schema (I worked with a lot of these use cases) NOSQL databases are the best option because it gives you the flexibility you need. But if you have transactional processes, referencial and structured data use a SQL database. With a good design both can be very powerful tools :)
Very good post bro! I really enjoyed reading it
Thank you :)
The right tool for the job comes to mind ... ;)
JSON column type with indexes and ->> function in mySQL look like end of discussion what is better
Hehe, well, it's an argument ^_^
A good argument too ...
Two things I didn't see you bring up that are key to the conversation:
Scaling and cost.
I would love to see the comparisons when including these two items. Scaling is done with either sql or nosql. What's the difference in complexity and cost for equivalent systems?
What happens if you need to scale that system up 100x for data and traffic?
Which one has overall lower total cost of ownership and which one is will require fewer total hours of developer, administrator, and data scientist time?
I kind of did indirectly bring it up, in that I said if you're about to "implement Twitter" you should probably choose NoSQL. However, if you've got data here I would love for you to put it on the table. Honestly, I don't know the answer to these questions, besides of that I know that if you're doing anything vaguely more complex than "dumping documents into your NoSQL database", then (obviously) the complexity of your middle layer will become much larger, making it more difficult to maintain, with higher costs in regards to human resources. Something echoed by one of the other commenters here where he explained some of his pains related to Mongo ...
Yet again, if you need document database systems, you need document database systems, and you're better choosing such systems. The article was written to convince others about that most of the time you do not need document database systems ...
Document based database systems was invented to solve a very narrow problem domain. The hype surrounding them today though ensures that they're used for everything, also domains where they're extremely inferior to RDBMS ...
Those are all very good points. What I run into is slowing down of sql queries due to the amount of data. Efficiency is the key there. I look at it as if you aren't expecting a lot of data, or don't think the amount of data will grow, sql is the best bet. If you don't know the upper limit to the data you're going to have then nosql is a better choice to me.
Oh, and if you're data structures and access patterns are going to change a lot then sql is probably a better choice.
Am example from me is using RDS vs DynamoDB on AWS. The same data and access is around $1k a month for RDS but only about $35 a month for DynamoDB. And DynamoDB seems to be faster at it as we can massively parallelize the data queries. It just works better for our use case. Also, if we do need the power of sql we can just export to athena and get the information we want for only a couple dollars.
I've seen systems that show to a crawl when using SQL because they get too much data for them to handle with how they have set up the system.
Yes you can easily scale sql if you know what you're doing, but I think it's rarely inexpensive.
I do think it's a right tool for the right job sort of thing.
You can scale RDB systems, but yes it's obviously much harder than to scale a HA replicated database system. I have seen RDB databases with billions of records, but I don't recommend it. Yes, data size is the key axiom here that defines if you need consistency or availability. However, very few databases needs data sizes that justifies NoSQL and availability, at least in my space. My article was just an attempt at pinpointing that fact. As to ...
Word! That was kind of like my punchline here ...
I think you can kinda do the join statement with $lookup in MongoDB right? Or is there some limitation to $lookup I don't know about:
mongodb.com/docs/manual/reference/...
That might be true, I wouldn’t know. However, the whole idea (and unique selling point) about NoSQL is that there’s no consistency (CAP theorem). Without consistency your “joined record(s)” might not even yet having been persisted and available for the cluster as a whole.
NoSQL advocates will object and say; “NoSQL has EVENTUAL consistency”, which is true. However, eventual consistency is equivalent to having your database in a never ending non stop state of “some of my stuff is garbage”
If you’ve got lots of updates, inserts and deletes, you can never trust your database without immediate consistency, ref ACID 😉
Try create a user in RavenDB, then login with that user immediately on your next line of code to see the problem (Hint; Your user doesn’t exist in the database at that point)
Sacrificing consistency for availability is like implementing 100+ threads. Never, ever, ever, ever do this. Unless you absolutely need it is my advice …
"Try create a user in RavenDB, then login with that user immediately on your next line of code to see the problem (Hint; Your user doesn’t exist in the database at that point)" the same issue doesn't occur in a sql db why? If you need to create a user in any style db, there's steps that must be taken to fill out the appropriate data no matter what.
No, you’re wrong. Google the CAP theorem. It explains the mechanics of the “Raven DB” problem. I could explain it to you, but it would be copy/paste of consistency versus availability concepts from CAP …
Have you heard of yugabyteDB? It makes this debate completely nonsense in 2022.
It's an "interesting" database yes. I haven't studied it, but it's a nice piece of tech ...
Distributed ACID transactions, NoSQL and SQL APIs, extreme performance, and it was born in Meta / Facebook... these are some reasons to try it seriously.