Any NoSQL true believers out there?

This troll job by MongoDB made me wonder if anyone is really hardcore in favor of NoSQL for general use-cases.

Did you find this post useful? Show some love!
DISCUSSION (62)

Sometimes NoSQL is really important... but I really believe most apps should use a relational model for main stuff.

It's not white and black, it always depends.

Example from real life:

github.com/coretabs-academy/websit...

In our academy system, we have a track which consists of many workshops which has many modules, each module has many lessons.

Okay, so:
Track => Workshop => Module => lesson

Sounds like a document model right?

So our academy library is begging for NoSQL, but we use django... and django hates mongo :(
And here we improvised and used relational model, guess what we end up with?

We got 4 joins between the four tables... so each time the user opens the academy to see the lessons, we need to perform 3 sub queries (let alone the ugly long query for calculating the shown lesson percentage).

Hmm, okay... how would the documentDB solution look like: it's really simple, one query (get track document) !

Yeah, we will rewrite it in dynamodb with lambda soon. Anytime we might get our server loaded.

Surely you wouldn't consider this alone to be a justification for using a document database? Are lessons ever shared between modules? Modules shared between workshops? Is a user able to take more than one Track/Workshop/Module/Lesson?
If the answer to any of the above is 'Yes', then it sounds like your data structure is actually relational. ( Like 90% of applications )
Modelling relational data without the referential integrity constraints inherent to SQL is just asking for trouble. Even if you take difference with SQL's query and join syntax, that's not the reason for it's enduring market dominance, it's because of the stability and security it offers.
You mention the need to perform 3 subqueries to populate the Workshop/Module/Lesson/Whatever, but are you familiar with how onerous and haphazard it is to query/manipulate subdocuments in Mongo? They've made some minor headway into resolving this is in the most recent versions, but it's still a flaming wreckage in every version <4.

The aggregation pipeline in MongoDB and Lookup mean that you can do meaningful queries using it now. There does appear to be a memory limit however as it merges the datasets in memory.

A wide developer once said "horses for courses", meaning you use the right tool for the job. For more than 20 years I worked with SQL of various flavours. However, a key discovery for me has been a different way of thinking about development; primarily the separation of the domain from the code and database schema.

Schamaless databases allow me to define data structures at runtime with ease. There is still a schema, but it is defined in data, not code. This means a huge degree of flexibility. If you are writing standard web applications that are bound to the domain model as you have been taught a SQL database will work just fine unless it is huge. The reason I adopted MongoDB wasn't about size, it was about flexibility.

"There is still a schema, but it is defined in data, not code"
Surely you have this the wrong way around? You're right that there is still a schema, but it is defined implicitly by how the domain model utilises the data. By going down this route you're forgoing the data consistency guarantees granted by referential integrity constraints. What if we want object deletion to cascade? What if we want to be sure our data relations are still intact? All of these basic responsibilities have been moved from the database to the application. I've seen the amount of application logic necessary to ensure simple referential integrity in a large scale application, it's not pretty. Feel like null-checking everything? Me neither.
Even Mongo's de-facto standard... O*D*M?... 'Mongoose' implements referential integrity disastrously.
Sure, you're very correct about defining data structures at run-time with ease. You are absolutely correct. This is a huge boon to some applications, but I feel that even modestly complex applications will out-scale mongo very quickly.
Also, have you ever had to use the aggregation framework for anything even moderately complex? It's a dumpster fire. It'll take you hundreds of lines to accomplish even the most basic aggregate queries that SQL is capable of.

My applications are more like spreadsheets in that the user defines the data structures and relationships. They do this at runtime and the data structures are stored as data, but used when data is submitted. We have introduced referential links between entities and it is possible to create views which traverse the references. We have implemented GraphQL to be able to get data, which is also able to traverse between documents using references.

In relation to maintaining referential integrity because there is no coupling to the domain there really is only one area of the code that needs to worry about this. We reap other benefits from this approach, including a elegant security model which means we have fine grained access controls over what fields and documents are visible to users based on an access control policy.

Trying to author your own aggregations is folly. In our application we have been able to do complex data transformations easily by having easy to configure transforms which generate the aggregations. Doing it by hand would be a living nightmare.

Is MongoDB the best solution for everything? Nah. For highly structured data like telco call records SQL is the way. For apps that are tightly coupled to the domain, which is typically how things have been done, is fine. But... and this is a big but... the way we tightly couple applications to the data model is making our applications less flexible than they need to be.

Schemaless systems are opening the door. Ten years ago I was where you are now; SQL was the light and the truth. Today my view is broader and I have been given good reason to question the accepted orthodoxy. That said we can't be blind to the downsides.

My applications are more like spreadsheets in that the user defines the data structures and relationships.

This is an excellent argument for you to use NoSQL stores. But I really don't think the scenario posted in this article needs one. What do you think? :-)

Do you implement GraphQL on the controller layer? or on top on HTTP REST APIs?

Controller. Used the standard Java API for GraphQL, but the schema is dynamically generated when the entities are changed. The schema is not fixed, rather it is defined in data.

Thanks Peter... I will come with couple of questions when we start implementing the system :D

"In relation to maintaining referential integrity because there is no coupling to the domain there really is only one area of the code that needs to worry about this"
What can this passage possibly mean? If there's no coupling to the domain then what is the data doing there in the first place? The problem domain will enforce some kind of invariants on your data, which the schema will need to enforce either explicitly ( through database level constraints such as PRIMARY KEY, NOT NULL etc. ), or implicitly through application logic.
If you're trying to say that there's no relationship between different collections then your application is a much better candidate for NoSQL, but in my experience such cases are actually exceedingly rare.

"We reap other benefits from this approach, including a elegant security model which means we have fine grained access controls over what fields and documents are visible to users based on an access control policy."
The same thing can be implemented at the database level through views and roles in most SQL implementations, which tend to be much more robust than application logic in my experience. That's just my two cents on the matter, however. Security and access control in Mongo has always been pretty much abysmal.

"Trying to author your own aggregations is folly. In our application we have been able to do complex data transformations easily by having easy to configure transforms which generate the aggregations. Doing it by hand would be a living nightmare."
Why would I choose to use a database solution where writing aggregate queries by hand is 'folly', when I can easily pick ones where it isn't?
For a challenge, see how few lines you can write a MongoDB query in that finds all documents where an arbitrary date falls between the range of two date fields.

"Ten years ago I was where you are now; SQL was the light and the truth. Today my view is broader and I have been given good reason to question the accepted orthodoxy"
Ignoring the obvious passive-aggression here, I have worked with MongoDB for years. I'm not some stuffy SQL shill who will never budge. I have worked with both for years, both writing new applications and maintaining legacy ones. I have already "questioned the accepted orthodoxy", and come to my own conclusions.

The answer for EACH one of your questions is YES !

But, what's the problem with copying one module from a workshop into another since it happens rarely?

Modelling relational data without the referential integrity constraints inherent to SQL is just asking for trouble

But NO, the ACID philosophy turned out to not be not pretty much scalable comparing to the BASE (no black/white case, it always depends).

You mention the need to perform 3 subqueries

As far as I see, we will replace them with just one query: get document (track) by id.

For filtering and shaping the data, I liked the way GraphQL works, we might add that layer on top of the normal query / REST endpoint.

but are you familiar with how onerous and haphazard it is to query/manipulate subdocuments in Mongo?

I know the lookup in mongo is a hell of an operator. But, let's avoid not seeing the forest for the trees... after all, (mongo and nosql) is like (bitcoin and blockchain).

If mongo makes the use of nosql hard, then there is dynamodb or firebase ;)

Ok you have 3 joins. What bad about this? I guess response is still 20ms. Did you research how to create local development environment with DynamoDB? Last time I checked AWS wasn't friendly for that case.

It is pretty simple to create a local environment with DynamoDB. I found a docker image a time ago with it, where you can use the javascript shell playground to learn and test some queries. It also have a jar from AWS if I'm not wrong.

I found the image that I have used (dwmkerr/dynamodb:latest), that was my docker-compose.yml

dynamodb:
image: dwmkerr/dynamodb:latest
ports:
- 8000:8000
command: -sharedDb

Thanks... this will help us a lot :)

I haven't put my hands dirty with dynamo, but I'm pretty sure the response time won't be 20ms in the relational model cuz from a scalability point of view, we do this fat query:

github.com/coretabs-academy/websit...

This is the with_is_shown function:

github.com/coretabs-academy/websit...

You see here that we store is_shown values of all users all in one table, and this will get slow in time the user base gets into 100,000 users where each user watched 100 lessons watched = 10,000,000 records to get the shown lessons !

I really think the models are shouting: "Please bring me the DOCUMENT model !" :D

You might mention sharding, but you see the problem isn't with the data growing bigger, the problem is within the model itself.

It's a shame I'm not that good with Django. If it would be ActiveRecord it would be much easier for me to understand what is behind. I will try to read it but no guarantees.

Can you get output of explain queries from the production db for those queries?

It would take me some time to get done right now, cuz I need to:

  1. Spin up the staging env
  2. Copy the production db into the staging env
  3. Turn into debugging mode (to run the debug toolbar)
  4. Get the generated query from there
  5. Run the explain query in DBeaver in the production db with the generated query

I will do once we do the first 3 steps these days

I hope you will post a blog about how the transition to new DB has gone and what decision process was. Without seeing actual DB (and hardly able to read Django) it is hard to judge, maybe you really have a good case for DocumentDB.

In our academy system, we have a track which consists of many workshops which has many modules, each module has many lessons.

That sounds like a classic RDBMS case!

We got 4 joins between the four tables... so each time the user opens the academy to see the lessons, we need to perform 3 sub queries (let alone the ugly long query for calculating the shown lesson percentage).

And what do you see wrong with that? 😯

Consider the other scenarios as well: what if you have to look for a particular lesson? You'll end up having to scan all of them!

I really think your use case gains nothing by using a NoSQL store. In fact, this loose data model may only present problems in the long run. If you're concerned about speed and number of queries (but do you have data to prove that it's actually affecting user experience?) go huge on caching (set up a Redis cluster, maybe?).

I'm not against NoSQL, please note, but convenience is short-lived while data models last forever, so I'm really, really skeptical of throwing away a relational model.

Here is where things go wrong:

dev.to/0xrumple/comment/5e8l

Consider the other scenarios as well: what if you have to look for a particular lesson? You'll end up having to scan all of them!

Looking them how? by title?

That's not our job, that's algolia's job ;)

I really think your use case gains nothing by using a NoSQL store

The most part I feel will get right, is the logical nesting of documents instead of m2m ugly relationships which have no actual benefit.

go huge on caching (set up a Redis cluster, maybe?).

We do caching as explained here:

dev.to/0xrumple/comment/5ebp

so I'm really, really skeptical of throwing away a relational model.

I'm posting this here to make sure we are taking the right decision :)

We do memcaching... but what's the point of caching is_shown for the lessons?

The user will have bad experience and say (I watched this lesson, why isn't shown till now)

Thanks for your reply.

  1. I wasn't asking specifically on the is_shown part, but rather about the performance issues you've talked about. you said "so each time the user opens the academy to see the lessons. we need to perform 3 sub queries" why can't you cache that?

  2. even on the is_shown part - why can't you expire the cache when you need to?

  1. We do caching with memcache... but caching is only an optimization, and the caching layer is gonna work after doing the cruel query, here is the caching mechanism:

github.com/coretabs-academy/websit...

  1. As you see, we preferred to give the user direct numbers, cuz the user can watch a lesson in one min, then he wanna see the is_shown True in front of him, to get the feeling of achievement, and not feel irritated

(we really get lots of responses as I watched the lesson why isn't it there, and that's just because of the frontend caching layer... cuz everyone wants the completion certificate :D ). That's why we accept the cruel query for this part.

Aside from all that, do you think optimizing with caching is really enough with all that mess... especially with the m2m ugly relationships :(

Use neo4j! A NoSQL graph database. πŸ˜‰πŸ˜‰πŸ˜‰

Technically, NoSQL reffers mostly to non-relational databases, and a Graph DB is all about relations, so I would say a Graph is more SQL than a standard RDBMS is :))

Also Neo4J doesn't scale (main advantage of the NoSQL), some new graph databases does like DGraph and Neptune.

Neo4j and Amazon Neptune are slightly different breeds. They're technically triple store databases. But yeah. Other than that I agree with you.

Is it not quite graph database use-case? I thought you would need graph DB when you need to traverse graph, like give me all friends of all friends of A (wherein relational DB you would join table on table N times so eventually you will run out of RAM), but graph DB literally traverse graph, so there is no penalty in memory.

That's true. It really depends what kind of queries someone wants to run. Even in current example, you could end up joining same table multiple times to get a desired result and graphs would do better than a relational database.

Actually the document model fits more cuz we don't actually need to traverse but to compose everything into one UI.

As in the pic, we show all the track workshops on the right side, and we calculate the percentage of the shown lessons of each workshop, so we need to get everything of each workshop at once.

workshops

But for the profile we have a similar case, each profile has dozens of tasks, quizze, and projects... and we will traverse them on demand (lazy-loaded).

Hmmm, I read about the graph DBs... but how does it solve our problem?

I see the problem as an aggregate root of Track (de-normalized all in one model) which is what the document model solves.

How would the graph model look like?

Neo4j allows you to have entities, quite similar to what a row in a table is. The key difference, subjectively, is flexibility to declare relationships between these entities in an easier manner than in a relational database. Aggregates can be easily created using their query language, Cypher, which isn't too hard and too different from SQL.

Yet again, if read speeds are critical and you can live without immediate consistency, then a key value or a document database would do the job perfectly.

Thanks for the elaboration, very appreciated !

Surely, we will discuss that with the team to see how things go... guess we are probably gonna use Neo (or any other suitable graphdb) with the profile model as well.

When all you have is a hammer ...

This is funny in so much as it keeps repeating in all aspects of our industry. "My tool is the best there ever has been!"

Boring.

Use the correct tool for the job. Sometimes that means an RDBMS (please show me how you'd build a sophisticated transactional system like accounting records or banking actions with NoSQL), sometimes that means NoSQL (Solr / ES for full text search -- RDBMSs are just not good at full text search as efficiently as these are hands down).

But the right tool is more than your database choice. Be open to different languages, frameworks, libraries, methodologies, etc. To pigeon-hole yourself into only solving things with C#/Angular/React/Oracle/Python/OOP is to limit your ability to provide actual solutions; but kudos to you for ticking off another box to say "Yep, I 'fixed' the issue with my standard kit!".

One example where SQL looses is Graph databases, but (MonogoDB sucks here too Β―\_(ツ)_/Β―). Otherwise PostgreSQL rules, with current hardware you can easily fit all database in RAM. If PostgreSQL not enough, there is also CockroachDB and Spanner.

I can miss some other use cases for other DBs, like BigTable and DynamoDB etc.

PostgreSQL supports document database (XML, JSON). PostgreSQL supports key/value stores.

PostgreSQL, it's Not Only SQL :p

This pisses me off so much. Relational theory is the foundation to the majority of software in existence today that persists anything. Relational databases are rock solid, and offer SO many things directly out-of-the-box for FREE. NoSQL has it's place in real-time and schema-less data, but even relational databases can be coerced to perform there.

Just because is popular doesnt mean is a good thing, we should evolve and learn from our mistakes.
And nothing is free, SQL has many downsides but we are tough to live with them and we think that "is natural".

Yes, me, but I will rest my case by doing the reverse, saying that SQL should have less believers :D

As I study databases more and more, the reasons to use a RDBMS/standard SQL are getting fewer and fewer. From the only hammer I knew how to use (a few years ago), I became an anti-SQL basically.

A few examples:

If you have a (fast iteration product) prototype/startup/small project there is no reason to waste precious time to handle a schema and applying migrations every day, so NoSQL is the smart choice.

If you have a huge project you would need a scalable DB, if you shard you will lose the benefits of relationships. You can insist on using SQL at this scale, but you will have to write/use something like Vitess.
*I don't include Spanner/CosmosDB/Cassandra in this topic, I'm referring to "regular SQLs" like mysql, oracle, sql server, postgres.

If you have (too) many relationships (2+ degrees of connections) you would want to move them into a Graph, and whats left probably could fit into a NoSQL.

If you have a financial product you would be crazy if you don't use an event-sourcing architecture (which can be later aggregated into SQLs, thats true).

If you want to store text for search, you would use something like ElasticSearch.

Let's not forget about TimeSeries (logs, analytics) and GeoData data, which none are best fit with SQL engines.

Like I said, fewer and fewer use cases the *SQL products have nowdays.

Am using CouchDB for a project and, franquly, I like it. Have used MongoDB in the past too. But am a long time SQL user, but since I have worked with NoSQL and when I have the choice, I’ve ended up always choosing NoSQL.

My default database of choice is Postgres unless I understand that my data model or scalability requirements do not fit but it rarely happens.
SQL databases are more common than NoSQL which means bigger community, public knowledge is all around and even cloud providers provide the open source SQL databases as managed services which make it easy to use in production. Usually when it comes to NoSQL the cloud providers provide their own propriety solution.

It's a great model for document-like structures. This includes actual documents, but things like user records as well. If your data has a lot of single-entity structure to it, then I think the noSQL type databases are a better fit.

They also make development easier as they aren't as rigid. Playing with the "schema" is easier.

For mass record-like data I'd still use a relational DB.

Ideally, a good DB-engine would just provide both types of data and stop pretending one is "better" than the other. It's like trying to argue that functional is better than imperative when both combined is preferred.

That's absolutely preposterous.

Good, attention-getting marketing, though. Can't blame MongoDB.

But if you have a boss actually influenced by an ad like this, make sure they and you have both read Dan McCreary and Ann Kelly's Making Sense of NoSQL.

Each major flavor of non-relational database has something it's good at (and plain-old relational databases in turn have things they're good at). This book will explain why.

They basically break databases down into 5 major types in use today:

  1. "Relational" (This is your classic FileMakerPro / Microsoft Acces / Oracle / SQL Server / MySQL / PostgreSQL database. The one you think of as a database. From a beginner's perspective, you can think of it as a bunch of Excel spreadsheets that cross-reference each other, and each record in a "spreadsheet" has a defined set of values you're allowed to fill in (think of the column headers.))
  2. "Key-Value"
  3. "Columnar" (A specialized form of key-value database. Be sure to learn how, plus why it's different enough to get its own name.)
  4. "Document" (A specialized form of key-value database. Be sure to learn how, plus why it's different enough to get its own name.)
  5. "Graph" (Some versions are a specialized form of key-value database. Be sure to learn how, plus why it's still different enough to get its own name.)

Interestingly, from what I hear at Salesforce talks, they present a user interface to their customers that, for all practical purposes, gives those customers a traditional "relational database," but on the back end, Salesforce has been migrating from storing what actually goes into those "databases" away from a relational database of their own and into a ... I think it was a columnar database, but I wouldn't swear to it.

I'm an Oracle Database user here, I just enjoyed reading the vitriol on Twitter this morning.

Looking at examples between what a MongoDB query looks like to one of my Oracle ones made it look messy as hell.

hardcore in favor of NoSQL for general use-cases.

[Emphasis is mine.]

If anyone is really hardcore in favor of anything for general use-cases, they are biased (read: plain wrong.)

General use-cases have general answers that are clear, simple, and wrong. (credits to H. L. Mencken)

if we're talking non-relational vs relational, there is an obvious need for non-relational databases and endless use-cases. document databases, graph databases, key-value store databases. the key-value store database is invaluable :P

NoSQL definitely has its place if you need "web scale", I'm not sure mongodb is the right tool for that job though. Most people use redis/memcached for a quick lookup cache, but the data is usually always in a relational store as the source of truth.

Real web scale use cases usually will use cassandra or other big boy tools :)

I'm not too familiar with mongodb as i haven't used it in a long time but it seems like mongodb tries to replace mysql/postgresql.

The question is, do you start a project using nosql as your main datastore? Or do you go with something more traditional that has known scale-out options.

As this tweet have put it: twitter.com/martinstraus/status/10...

Using a NoSQL DB just for the sake of it is just a dumb decision, no matter the project.

That goes for every other piece of technology.

The fact that relational technologies fits most use cases even emphasize how such a tweet is just plain wrong.

Like any tool, MongoDB may have use cases, but apart from data caching (by taking into account the data-loss, lack of serious guards and integrity checks, I don't see any interest in MongoDB.

I fail to see any advantage in NoSQL databases compared to PostgreSQL. Could anyone enlighten me?

I started my exploration in web dev using Mongo, but consistently found myself doing relational things with it. Eventually, I took the hint and kicked Mongo to the curb, and have happily been using MySQL and Postgres since.

NoSQL certainly has its place in specific situations, but I'm a strong believer that it should never be the first choice. You can do a lot with SQL, and some databases even support direct json queries, so there's no reason to pick NoSQL over SQL in the vast majority of situations.

I started my exploration in web dev using Mongo, but consistently found myself doing relation things with it, even though my data seemed non-relational at first glance. Eventually, I took the hint and kicked Mongo to the curb, and have happily been using MySQL and Postgres since.

NoSQL certainly has its place in specific situations, but I'm a strong believer that it should never be the first choice. You can do a lot with SQL, and some databases even support direct json queries, so there's no reason to pick NoSQL over SQL in the vast majority of situations.

I don't think they themselves believe such thing, they are just retorting to the absurd in their advertisement to sounds funny and people talk about it and this post and discussions I saw on Twitter are the proof that it is working!
Werner Vogels, Amazon.com CTO, explains very well that purpose-built database is the way to go.

MongoDB is an all-around terrible choice for NoSQL. Also the company and their community are often ***holes about it.

Relational DB people on the other hand have jobs and employers that often also deploy traditionally boring "NoSQL" data stores. And again, nobody talks about it, because it is boring.

Firestore by Firebase is a pretty nice DB to use. I know it's still in beta but I've been using it recently as a data store and it's quite nice. Also setup is next to nothing for it. I'm not saying it's a replacement for any of these other solutions but it's damn good.

While AWS tries to solve some SQL issues with "Aurora Serverless" I have the impression most serverless projects are using NoSQL DBs like DynamoDB.

I used RethinkDB in a production system at one point and it was fantastic. I'm sad the commercial company behind it died.

NoSQL is a tool, you can fall in love with a hammer but when you need a screwdriver the hammer can't helps you.

I'm using both MySQL and Amazon's DynamoDB for work.. No, I don't worship NoSQL like that hahaha

Classic DEV Post from Jan 1

New Dev Year Resolutions

How I plan to level up as a developer in 2018

Ben Halpern
A Canadian software developer who thinks he’s funny.

DEV is a community of software developers

Sign up (for free)