A continuation of this series. The discussions have been great so far!
I think these two represent two factions in the database world that make for a good compare and contrast. But feel free to broaden the discussion to other databases.
A continuation of this series. The discussions have been great so far!
I think these two represent two factions in the database world that make for a good compare and contrast. But feel free to broaden the discussion to other databases.
For further actions, you may consider blocking this person and/or reporting abuse
Matheus Martinello -
Franck Pachot -
SoraKumo -
Mofajjal Rasul -
Top comments (71)
Long time user of SQL of various varieties and PostgreSQL specifically. There was a time when the idea of using anything but a SQL database was heretical. And to be frank some of the comments below indicate that using non SQL data storage is still considered heterodox. The orthodox thinking is on clear display when Noah says "one of the most important parts of designing a system is fleshing out your data model".
In this statement there is the core weakness of orthodox software development, mainly that the foundation stone of our software is the model. The model is usually established by the SQL and object model, with a mapping using Hibernate or whatever other object relational mapper. when you bind the data structure to the binary artifacts you cripple flexibility. You will build business rules into the domain objects. Changing the data structure becomes almost impossible without management of both binary deployment and schema changes. I spent decades dealing with this kind of problem from even before SQL. I was a dBase programmer back in the early ninties.
PostgreSQL is my favorite SQL database. But it may not suit everyones needs. A telco I worked for used MySQL. There are workloads where the data structures are mature and unchanging, where flexibility and adaptability isn't critically important. Where tight coupling between code and schema is acceptable.
In 2013 I began working on a Automation Engine which used a data storage mechanism which was flexible. It actually used PostgereSQL under the covers, but it allowed huge flexibility that decoupled the code from schema. Later this technology adopted MongoDB under the covers. However it became clear to me that using MongoDB directly would give me all the same features with far better performance.
If you simply treat MongoDB as any other relational database and build applications tightly coupled with a schema it is like complaining that your trail bike doesn't go as fast as your Ferrari or carry as much as your pickup truck. The who point of MongoDB is the flexibility. If you are simply going to build applications like you did with SQL and expect it to be feature for feature identical you are missing the benefits.
The door MongoDB opens is the ability to write applications that are not tightly coupled to the binary. You can store arbitrary data easily and allow users to define what data they want to store at run time rather than design time. It delivers a kind of flexibility and adaptability that allows us to do things which are essentially impossible to do with SQL databases. I've seen a similar application as the one I'm working on and the pipeline of SQL schema scripts and messing about is scary. Dynamic changes to SQL can be time consuming and potentially dangerous. Imagine trying to change a schema on the fly while there are users online. That kind of thing is just a non issue with MongoDB.
If you are going to embrace MongoDB you should also be embracing its strengths. There is no point using it as a drop in replacement to SQL, because you can't beat SQL for being an SQL database.
But this is all backwards. I would rather say that you should challenge the orthodoxy of tight coupling between schema and binary. By smashing that dependency and embracing an approach which is more universal and flexible you are free to write applications which are more general purpose. If you accept this philosophy you will find MongoDB is a useful tool to help you on the path.
It isn't the only option by any means, but it is a pretty decent start. I made a video about all of this in 2017.
youtube.com/watch?v=uwZj4XF6zic
I can't help but think of my experiences with a mature GraphQL implementation. Schema first, and the binary (de)composed of many services.
There is data, the way you gather it, and the way you put it forth for consumption. I suspect they will all remain work to be done.
The Java GraphQL implementation assumes a fixed schema burnt into the binary; same orthodox philosophy.It took me a while to be able to generate the schema dynamically in order to expose GraphQL endpoints that would dynamically change as users modified the data structures. One of the requirements I was given was to introduce GraphQL endpoints to our system.
If it was an orthodox app this would be quite easy, you just write the classes required to get data from the model. No worries. Only if you are able to add or modify schema at runtime what do you do? It was a little fiddly, in that I had to have a trigger mechanism when the data design changes in order to programatically rebuild the GraphQL schema. Obviously this is far from ideal. Dynamic modification of a schema would be better than rebuilding.
Also built a bunch of functions to do various queries which are not really part of GraphQL syntax, but can be supported. However, to be blunt the implementation of the GraphQL library I used forced me into doing hundreds of queries to fulfill requests. A better way would be to convert the GraphQL into a single aggregation which can be run and returned.
GraphQL is another example of a technology which while helpful can drive developers into domain binding.
The problem isn't that building applications bound to a data structure is always wrong, rather that is has become so orthodox that developers don't even question it. I was one of them. I thought that using anything but SQL was insanity.
I can't disagree with any of that! I just got your point :)
Thank you very much for your opinion!
I totally agree with you. Most developers, that complaint against MongoDB are just treating as it is another relative database. I saw a lot of articles, complaining about $lookup, just because they've normalized their data .-.
MongoDB is a good joice for any small or mid-sized start-up, because they require a lot of changes and a really small amount of time.
Oh my god! This this SO MUCH THIS! This comment mirrors exactly WORD FOR WORD what I think about NoSQL vs SQL. Really, when flexibility is a requirement, you can't use SQL. That's just how it goes.
100% this.
So many devs (including myself) got burned when MongoDB came out because it was touted as a silver-bullet data store, when in reality it was created for an entirely different problem than what most projects needed. By the time devs realized MongoDB was causing pain like major performance bottlenecks and difficulties accessing the data they needed, they were already in too deep to fix the issue.
Real-world data is relational, pretty much always. While it's pretty common for single records to have hierarchical data (which is what makes non-relational DBs sound appealing), it's very uncommon for entire datasets to be limited to a bunch of singular records of purely hierarchical data. You will end up making multiple collections and trying to reference them just like relational models, except that you'll have zero help from the database in querying them efficiently.
I'm a strong advocate that you should never start your project with a non-relational data store. It's much easier to add one later on (which will probably never happen) than it is to move from non-relational to relational.
Not to forget that PostgreSQL supports JSON and XML as a datatype. So where suitable you can just store JSON or XML in a table.
You can even create indexes on directly on the JSON data. (Or you can use index functions like you can do with XML data.)
Now this one is a bit different than previous discussions in my opinion because one is relational and other is non-relational database so it's more dependent on the actual need of your project.
This may be a unpopular opinion but I think that MongoDB (and non-relational databases generally) are quite hyped up. So far whenever I considered using non-relational database I realized that using relational database would actually be more appropriate (which is understandable considering most of our job as web developers is dealing with the data which has a clearly defined structure).
I haven't used PostgreSQL that much (I'm more of a MySQL guy), but we can look at it as Oracle database of the open source world. It has many amazing features and is much more advanced than MySQL (for example, PostgreSQL supports indexing of JSON while MySQL doesn't).
On the other hand, if you're dealing with big amount of unstructured data, MongoDB is great. I had to do this in one project where we had various products that could have different types of attributes and it's much easier to deal with than implementing EAV in the relational database (I'm looking at you Magento) which quickly becomes a mess and breaks data integrity, removing the point of using relational database in the first place.
Great point. Most of people who dislike NoSQL is because they went on full to replace "relational" dbs. They should never be treated as replacement or alternative but complementary. This means you don't need 1 type of DB. It should be a resource limitation not technological one. Like you don't have enough people to maintain more than 1 DB system or it's too expensive to do so. I have project depending on 3 different DBs for now and it might get bigger. Graph DB or more speciffically Neo4j in that system cannot be replaced by SQL in terms of ease of use and actuall logic of data structure. However users, tickets, clients... are stored in PostgreSQL while geo data is in Mongo. No single reason is there to move everything to "relational" not even performance as it's quite fast this way. Geo service and postionig are indipendent of graph data. If one goes down other is usable and if both are up traffic from one doesn't pressure the other. They all depend on users which are accessed one per auth so SQL is not that much active. I know PostgreSQL has support for JSON and geo stuff but I took me 10 mins to setup Mongo and it works why would I bother with others?
Maybe now people realise it's not DB type fault it's you who pick it for wrong purposes.
You have PostgreSQL and don't use PostGIS to store geo data, isn't that a crime ? ^^
Anyway you are right, I work with both, there are things I would never do with mongo (like complex geo-calcul with hundreds of gigabytes of geo-data, I understand that if you have needs that are only selection of points in an area, you don't really need PotsGIS), and other I would never do with postgres (typically storing big amount of ad-hoc data with ad-hoc structure for specific clients display (lots of geo-data for thousand of users, associated for exemple with unpredictable data that will be added later), or specific usage in general).
Also, you can always use a DB to do things it's not really meant for, like you can unlog table in postgres to improve performance (but you loose the D of ACID), or there is recent improvements in mongo that adds transaction (to add the A), but in the end, you should never try to find the unique or the perfect DB, they are meant for a purpose, and without falling in the opposite (like you said, too mush DB to maintain) you shouldn't try to fit a round in a square.
In the end, to me, it's not "MongoDB VS PostgreSQL" but "MongoDB WITH PotgreSQL", because I think they are the perfect combination for most of complete projects today.
If just some attributes of your data is instructed, in PostgreSQL, you can put that into a jsonb column. It can be indexed and relatively easily queried. I use that for example when one of models receives a webhook with some more or less random stuff inside.
IMHO, Mongo was overhyped when it came out - but that doesn't mean it's without merit. The problem (as with soooo many tech solutions) is that the fanboys started touting it as a complete replacement for RDBMS's. And the "old guard" wrinkled their noses cuz they still wanted to see RDBMS's as always superior to flat file structures.
Anytime I hear someone touting either solution as The One True Answer, my only reply is:
If you can't outline at least some of the strengths/weaknesses of either approach, then I really don't want to hear any of your recommendations for the DB solution that should be used in this app.
If some guy wants to build my house, and the only tool in his tool belt is a hammer, I will NOT be employing him. I don't care how long he's used the hammer. I don't care if he can tell me 100 reasons why his hammer is The Best Tool EVER! If he can't explain to me when it might be appropriate to use a screwdriver, I don't want him anywhere near my foundation.
Hi Adam! that's an understandable approach. The only fallacy is that you're also implying that any dev can argue about any other tool from self experience (and not just by having read articles or discussed it with other people). Also, I'm quite sure I can find a carpenter who doesn't know all the tools available out there but can still build a life lasting house.
In our case: why not a graph DB then? Or a time series DB? Or an object DB? Or a flat file?
Any dev that has read a bit of the main differences of either approach could give you a reasonable answer, but is that sufficient for your evaluation?
In one case logic is enough ("I need to relate exams with students, therefor I'm using a DB that has relations"), in the other case experience will be never enough because only a small subset of developers have enough experience on all types of DBs to be able to evaluate each of them in each case a priori.
Does this make sense :D ?
That... wasn't what I was implying at all. Nor was it stated as such in my original reply.
Nevertheless, I don't understand how what you've described is somehow a "fallacy". Why shouldn't "any dev" be able to "argue about any other tool from self experience"??? Arguing in favor of a tool/tech/approach/etc
!==
forcing the implementation of that approach. When the team is in a research/decision phase, there's nothing at all wrong with someone proposing a solution based upon their own experience. It doesn't mean their proposal will be adopted - but neither does it mean that there's anything wrong with making the proposal.You just completely twisted my words. Please look at my original reply and tell me where I said that the carpenter must "know all the tools available out there"?? I'm talking about the fact that farrrrr too many people in tech glom onto a single approach at the exclusion of all others.
You don't have to know "all the tools". None of us do. But if the only tool that you continually, blindly, stubbornly force into every project is a hammer, then you're no carpenter. You're a hammer salesman.
Indeed. Why not?? If you're throwing these labels out there as fringe examples that simply don't warrant consideration, then you are highlighting the problem that I was originally alluding to.
No. Most of them cannot (or, more accurately, will not). Your statement sounds very reasonable - if you've never actually had to deal with developers before. There are far too many devs who learned a given tech/tool/approach/whatever - probably many years ago - and now they refuse to properly assess any alternatives. Maybe they can give you a cursory, 30-second explanation of Tech A vs Tech B, but even in that brief synopsis, they make it clear that they love Tech A and they've never seriously considered Tech B.
No. This shouldn't be a case of "logic vs. experience". We should never throw out logic due to past experiences. Nor should we throw out experience due to some academic maxim of logic.
Sorry, it definitely wasn't my intention to attribute you words and meaning, it's what I understood by reading your comment. That said:
That's not what I meant as well. I'm trying to say that most people are not reasonably knowledgeable enough and also don't have enough time to test all potential options for every single decisions to make a perfectly informed choice. That's why we rely on the collective experience. I can't test all databases every time I need to choose one, but I can have a generic idea of which type I might need based on personal or collective experience.
That was my point.
Sure! But choosing a DB is not the same thing, right? I often think engineering or carpentry analogies fall short when compared to software (the prefix "soft" in the word is very apt).
Let's see it like this: building a house is a collective effort which requires all sort of tools in all cases, one could prefer a type of hammer or another, but they still require one. They also require construction materials, welding, scaffolding and other stuff. A carpenter going around the construction site saying "please weld with my hammer" wouldn't make sense anyway and they probably wouldn't have a job :D
Choosing a database only implies knowing which data model better fits the application one want to build and a bit of forward looking. Teams can still choose wrongly and it can cost the company a lot, but changing a database is not the same as redoing a house because nobody used bricks in the first place.
That's why I think civil engineering analogies often fall shorts when directly compared to software development. Same with the evergreen tendency of comparing building a bridge to software architecture -_-
Because of time constraint. The same argument would apply to programming languages, libraries and everything else. You'd be in forever constant comparison cycle and never get anything done.
Sure and I agree with you here but that's a character trait, they can still be right sometimes? At least once ;-) (after all the hammer is still the right tool in some cases).
No, as I implied, maybe incorrectly, you need both at the same time. That's why the camp "always use a DBMS forever and ever" is wrong but also "spend 3 months evaluating all possible options for every tool" is also not always practical due to time or cost contraints
Somehow, almost nobody remembers that when it came out, MongoDB's main premise was that MySQL wasn't scalable, and MongoDB was. It was a big fat lie. (It was just like Svelte's creator complaining about React all the time.)
Yes, there are cases where storing/consuming unstructured data is a plus. Except, like in all dualities, both approaches have pros and cons.
It's the same with structured vs unstructed, server-side rendered vs client-side rendered, interpreted vs compiled, distributed vs centralized, object-oriented vs functional, and this vs that vs the other. There is no silver spoon whichever approach you choose.
All SQL databases and noSQL databases have their strengths and weaknesses.
I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)
I recently found out about this "PostgREST" project, which provides RESTful API to Postgres database.
It seems that they do actually pull off the "fetch resources with related resources using a single request", reminded me of your wish!
postgrest.org/en/v7.0.0/api.html#r...
Isn't this a SQL JOIN you're describing? If you're talking about multi data store you can do that with foreign data wrappers and have something foreign to PostgreSQL appear local, or if you're talking multi system you can do it with an API proxy like Kong or with any GraphQL server.
So yeah, it's possible :-)
Yes, SQL JOIN; except, it returns a single row for every parent record.
If by a "multi data store", you mean being able to merge data from multiple sources (something like one part from MySQL and some other part from PostgreSQL), that's not what I mean either.
As for a GraphQL server, if I'm not mistaken, a graphql server fetches data from the source then formats it to the required specification. What I'm looking for is something like a GraphQL server embedded in the database / data store.
Let me try to write a pseudo query and a sample output for what I mean:
select students[id, firstname, lastname, student_no] as root, current_courses[course_name, course_code]
FROM students
MERGE students_courses on (students_courses.student_id = students.id)
MERGE courses as current_courses on (students_courses.course_id=courses.id)
WHERE student_no=1234
And the result
{id:1, firstname:Necmettin, lastname:Begiter, student_no:199601010725, courses:[{course_name:"Comperative Linguistics", course_code:"ECC101"}, {course_name:"Psychology of Learning", course_code:"ESL101"}, {course_name:"Something something", course_code:"ESS101"}]
A few things to note:
id
,firstname
,lastname
,student_no
, andcourses
, with thecourses
field being an array of objects.Long story short, yes, what I'm describing IS a JOIN operation at heart, but much more than that.
I'm not sure we're talking about different things though. I think it's a matter of perspective. An API that queries multiple data sources and then returns them to you as a result of a single query is to you, the caller, a single query.
PostgreSQL and other DBs splits queries in multiple parallel fetches sometimes, they physically make multiple reads at the same time, but does it matter to you? No, because with you issue one SQL query to get all the data you need.
Moving on to your example, you can already do it:
User 10 has 3 comments, I selected them all with one query and aggregated them in an array of JSON rows.
I'm sure the query can be simplified with a CTE or other clever tricks but it's a decent start.
I can understand why it's a little bit more complicated to do with noSQL DBs. They usually have more complicated ways of putting data in relation to other data.
Yes, multiple queries resulting in a single resultset is, in the end, a single call for the client, but on the server side, it means multiple calls. You are right of course, but I'm a little unforgiving I guess :)
Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)
What can I say, I'm a grumpy old programmer, it's not easy to satisfy me. ;)
Cheers.
Not really though: JSON in PostgreSQL is just JSON:
Integers are integers, strings are strings and booleans are booleans
When PostgreSQL came up with the JSON features, I remember reading examples and always seeing 'some-json-formatted-data'::JSON, so I assumed JSON data is given to / received from PostgreSQL in a string. Even the comments field in your example a few comments back (the one with the 3 comments of user 10) has the JSON data in strings.
If that isn't the case, my bad, I didn't know PostgreSQL had progressed this far, I thought it was still using strings to encapsulate/represent JSON data contained in cells.
Good News!
Your dream database is already here and it's taking off as we speak. It's called FaunaDB. Join to FaunaDB's slack and see what's it all about. You'll thank me laterπ
I will probably have to spend some more time in the documentation, but I was unable to find an example of what I've been describing. How would you write the pseudo query I provided (a very quick example with multiple assumptions is more than enough)?
If you are corporate, SQL is indeed a more solid solution, but not sure about dev experience.
PostgreSQL and MySQL both support JSON fields with operators.
I know, but not sure about drivers' output, for example Node.js's
pg
.Also, not probably not used much, is MongoDB 's capacity to cleaning data in JSON. It is harder in SQLite' s JSON1.
From what I have seen, PostGRES has different JSON querying syntax, but haven't tried yet.
True, but PostgreSQL 12 added the standard SQL/JSON Path language.
So that can be portable as well :-)
My favorite place to check the status of things is Modern SQL
I have seen this feeling a lot when talking about NoSQL vs Relational, that SQL is hard.
I don't quite understand why.
Of course, a bunch of denormalized JSON data will always be easier, but I don't think basic SQL can be that hard for beginners.
When you start doing nested subqueries, JOINS across many tables or even aggregations I can understand. Even someone with experience can sometimes get confused, but for most applications, you wonΒ΄t need that complexity.
Even the language itself is more or less natural "Select this fields from this table",
ItΒ΄s like start doing aggregates and other more advanced stuff in Mongo. The complexity increases.
Example please?
severalnines.com/database-blog/sec...
BTW, there is also GraphQL injection, both for SQL and NoSQL e.g. petecorey.com/blog/2017/06/12/grap...
Quite literally my very short experience with NoSQL. It was awesome at first, honestly, but quickly turned out to be a huge mess for the 2 (two) projects I got involved.
I'm sure I could have been involved in better NoSQL situations but so far I just didn't have that luck.
I wish I had time to write up a complete post for this, but I just want to say that I worked somewhere where we used mongoDB as the only database, for all our microservices. I don't recall any major issues on account of that decision... at least not related to mongoDB itself, but we did have to revisit our code for handling database results a few times to improve performance there.
Mongo was easier to work with in nodejs than postgresql was (at the time anyway, maybe there are better libs for it now). Also since it was microservices there weren't many places where we needed to do database joins anyway. Mongoose made it pretty easy and fun to work with. As far as performance, on launch day we had over 100k visitors with no issues. We regularly had large traffic spikes and handled them well.
Yeah, Mongo really latched on to Node and ensured really good tooling. I think that was the biggest factor in its success.
Great! Isolated small services where your data layer is mediated by the API is definitely a good use case for document DBs π₯
Splunk logs - logs and events are inherently non relational and perform much faster.
This does not generally apply to software, and in general, I would agree that non relational is almost never a good long term option for an app with any reasonable domain it is trying to model.
A big part of logs are relational. It has a timestamp, host, application, log level, a message, and commonly a "category". The other data is less well structured between the various log event producers, lets call this "meta".
This meta data you could store in a less explicitly structures relation. It's usually a simple key->value structure anyway. For that you could use PostgreSQL's hstore.
But... I do not know if I would use PostgreSQL to store log events. Although PostgreSQL has native sharding these days. Setting up a distributed PostgreSQL farm is way more complicated. Log events are a good candidate for eventually consistent. Or even, never consistent is also acceptable. It is mostly appending entries, pruning old records, and occasionally performing a query. ACID is also no strong requirement.
Nah, none of those fields are relational imo. Relational fields are like UserId, EventId, ParentId, etc. Pointers to other things.
If you want to build a relational logging platform be my guest, but I suspect there's a good reason why most of the big players in logging use NoSql or InfluxDB.
I would argue that a time series DB like InfluxDB is a better storage for timestamped logs and events than a document DB is.