loading...

PostgreSQL vs MongoDB

ben profile image Ben Halpern Updated on ・1 min read

A continuation of this series. The discussions have been great so far!

I think these two represent two factions in the database world that make for a good compare and contrast. But feel free to broaden the discussion to other databases.

Discussion

pic
Editor guide
 

I've worked with both MongoDB and PostgreSQL quite extensively (it's been awhile since I've used MongoDB seriously though). There are almost no cases where you should use a non-relational datastore over a relational store. You will regret it later if you chose the former. It's a little known fact that one of the most important parts of designing a system is fleshing out your data model. That doesn't mean knowing every detail, but thinking through your data enough to define a well-structured model. Software 'joins' are also significantly worse than JOIN, both in performance and brittleness. I'd really like to hear if anyone can legitimately think up a case where a non-relational datastore is better in some way than a relational, because I've thought about it on multiple occasions and have never come up with anything.

 

100% this.

So many devs (including myself) got burned when MongoDB came out because it was touted as a silver-bullet data store, when in reality it was created for an entirely different problem than what most projects needed. By the time devs realized MongoDB was causing pain like major performance bottlenecks and difficulties accessing the data they needed, they were already in too deep to fix the issue.

Real-world data is relational, pretty much always. While it's pretty common for single records to have hierarchical data (which is what makes non-relational DBs sound appealing), it's very uncommon for entire datasets to be limited to a bunch of singular records of purely hierarchical data. You will end up making multiple collections and trying to reference them just like relational models, except that you'll have zero help from the database in querying them efficiently.

I'm a strong advocate that you should never start your project with a non-relational data store. It's much easier to add one later on (which will probably never happen) than it is to move from non-relational to relational.

 

Not to forget that PostgreSQL supports JSON and XML as a datatype. So where suitable you can just store JSON or XML in a table.
You can even create indexes on directly on the JSON data. (Or you can use index functions like you can do with XML data.)

 

Splunk logs - logs and events are inherently non relational and perform much faster.

This does not generally apply to software, and in general, I would agree that non relational is almost never a good long term option for an app with any reasonable domain it is trying to model.

 

Logs are indeed a case where I think this applies. MongoDB is absolutely the wrong tool to use for that though, there are much better data stores out there that properly handle timeseries data well.

 

A big part of logs are relational. It has a timestamp, host, application, log level, a message, and commonly a "category". The other data is less well structured between the various log event producers, lets call this "meta".
This meta data you could store in a less explicitly structures relation. It's usually a simple key->value structure anyway. For that you could use PostgreSQL's hstore.

But... I do not know if I would use PostgreSQL to store log events. Although PostgreSQL has native sharding these days. Setting up a distributed PostgreSQL farm is way more complicated. Log events are a good candidate for eventually consistent. Or even, never consistent is also acceptable. It is mostly appending entries, pruning old records, and occasionally performing a query. ACID is also no strong requirement.

Nah, none of those fields are relational imo. Relational fields are like UserId, EventId, ParentId, etc. Pointers to other things.

If you want to build a relational logging platform be my guest, but I suspect there's a good reason why most of the big players in logging use NoSql or InfluxDB.

 

I would argue that a time series DB like InfluxDB is a better storage for timestamped logs and events than a document DB is.

 

You will regret it later if you chose the former.

Quite literally my very short experience with NoSQL. It was awesome at first, honestly, but quickly turned out to be a huge mess for the 2 (two) projects I got involved.

I'm sure I could have been involved in better NoSQL situations but so far I just didn't have that luck.

 

Agree that data structure is important. However dissagree with almost no case. My data structure is represnted as relationship but actual relationship in Neo4j where it works much faster for searching and doing actual related stuff. SQLs are called relational but are not so much that. Most bottlenecks I faced were due to having many to many combined with other joins as "relational" dbs didn't offer arrays or embedded objects and so. Now that might be just 10% of project structure but unavoidable 10% being used all the time. Rest of the structure made way more sense in relational ones but at the cost of having loading icons in many places once user reaches most usefull features of the project like "I click here and it updates 15 different data types"

 

I would consider graph databases their own thing, separate from traditional document (non-relational) and relational stores. But only very specific types of data fit properly into a graph database (GIS data, for example). If you're choosing between a document store and a relational store, there's a clear winner, which is the point I'm trying to make.

Also you are incorrect in saying that relational DBs don't support embedded objects, PostgreSQL does support embedded objects, either in the form of K/V, JSON (natively supported), and BLOBs. At least in the case of JSON, you can index on the objects too.

Not read properly. They didn't, they do now and it makes no sense to store data in it which might varry in structure just becuse they did some support for it. Why use C for web API? Or even Rust. I don't care if I can it's much easier with others. Same goes for DB. Each DB has some purpose and forcing one type doesn't cut it for me. There's no clear winner. Just improper usage. You wanna store site clicks, and search requests body? Why the hell would I bother with structure, I just need it to push data so someone plugs a tool in it and do they're casual analitics which they probably delete on monthly basis or so.

 

Hi Haris, PostgreSQL had arrays like forever, some 20 yrs ago I built a genealogical-based indexing (tree-like), based on arrays and int_array contrib (which also was there since 2000's) - and besides ltree being in the core distro already.
In my first emails to the pgsql-sql list , I was almost advised to stay away from arrays since they were designed for internal relations and structures, but I didn't listened to them.. still happily running the app based on GIN indexes .

That's the difference between you and me. I only use whats actually considered standard and what's considered ok or good to use. I honestly don't think people get how easy it is to setup MongoDB with all that working and no customisation required. When I picked MongoDB for Geo stuff first I found benchmarks old a year which showed faster MongoDB geo search on non indexed data in comparison to PostgreSQL and it was least old benchmark. Others were couple of years old. Second, no extra config was required to use geo tools while PostgreSQL demanded some plugin enabling or whatever. Couldn't care less if it's doable out of the box because I automated mongo in 5 minutes and it gets the job done. Works with +20k polygons quite fast. Storing sensor values which could also contain objects or array is straightforward. Why would I bother with schema? I have no benefit of it nor do I care if this DB has also support for this or that. MS SQL and Oracle if I'm not mistaking have graph like support. Probably so does PostgreSQL or will have. Why would I use it instead of Neo4j? Because some enterprise dudes told me so? I store measurements in there which have different fields/properties. Then I can aggregate it regardless of different structure. Why would I use Postgres' JSON? I mean it's by default json-like why others have need to enforce using relational stuff?

I'm not fully against using some stuff out of standard but SQL-s great for some stuff and I would like to use others that are great for those other things. Basically MongoDB has been serving me quite well and it's well hated just like Java. And frankly people still use it

Ok. arrays might not be advertised by some emails I got in 2001, but they were standard, documented and fully supported already by the same year 2001, so it was pretty legit to use, maybe the guys who answered were relational purists, I cannot know, which year are you referring to in your comment about "relational" dbs not having arrays?
What's actually considered "standard" for most people is running Oracle or MS SQL Server. You want a second definition of "standard"? It is standard to spend SUBSTANTIAL time resources in advance for design and planning , every time my programmers talk to me about "5 minutes" I get terrified about what happens in 1 year or so. Dont get me wrong, bad code and bad practices are also super possible with (postgre|my|ms|DB2)sql and so forth, heck even hibernate terrifies me as I see those awfull queries in the pgbadger reports. But at least by a good design (and LOTS OF DB CONSTRAINTS / triggers/automations/integrity checks) you are doing your best about the durability, reliability and credibility of your system and happiness of your bosses. I am not familiar neither with mongodb geospatial nor with PostGIS, but I'd say here that nothing is free in life, if mongo makes it appear so appealing against a system which has been so heavily looked after and developed then I'd place my bets that there would be surprises sometime in the road ahead.
PS
Does mongo give you some way e.g. to disable ovelapping polygons? Like it is forbidden that two different non-affiliated owners have pieces of land that overlap one another?

I don't know nor did require such feature. I only seek where do given coordinates fit in. But good thing to investigate thanks fot the question.

5 minutes for dev purpose in the cloud. Developers were not born yesterday neither was I. No need to point the obvious that thing can go wrong and nothing is free. It's as simple as it works now and there's no proof of it falling later. I know some people would love it to fail to prove their point but I'm not like that, I'm not toxic person from stack overflow. I actually use stuff if it works and don't care about the negativity because in the end of the day it all breaks and people make it work with a lot of weird things so only way to proove something is bad is when no one made it work.

 

Long time user of SQL of various varieties and PostgreSQL specifically. There was a time when the idea of using anything but a SQL database was heretical. And to be frank some of the comments below indicate that using non SQL data storage is still considered heterodox. The orthodox thinking is on clear display when Noah says "one of the most important parts of designing a system is fleshing out your data model".

In this statement there is the core weakness of orthodox software development, mainly that the foundation stone of our software is the model. The model is usually established by the SQL and object model, with a mapping using Hibernate or whatever other object relational mapper. when you bind the data structure to the binary artifacts you cripple flexibility. You will build business rules into the domain objects. Changing the data structure becomes almost impossible without management of both binary deployment and schema changes. I spent decades dealing with this kind of problem from even before SQL. I was a dBase programmer back in the early ninties.

PostgreSQL is my favorite SQL database. But it may not suit everyones needs. A telco I worked for used MySQL. There are workloads where the data structures are mature and unchanging, where flexibility and adaptability isn't critically important. Where tight coupling between code and schema is acceptable.

In 2013 I began working on a Automation Engine which used a data storage mechanism which was flexible. It actually used PostgereSQL under the covers, but it allowed huge flexibility that decoupled the code from schema. Later this technology adopted MongoDB under the covers. However it became clear to me that using MongoDB directly would give me all the same features with far better performance.

If you simply treat MongoDB as any other relational database and build applications tightly coupled with a schema it is like complaining that your trail bike doesn't go as fast as your Ferrari or carry as much as your pickup truck. The who point of MongoDB is the flexibility. If you are simply going to build applications like you did with SQL and expect it to be feature for feature identical you are missing the benefits.

The door MongoDB opens is the ability to write applications that are not tightly coupled to the binary. You can store arbitrary data easily and allow users to define what data they want to store at run time rather than design time. It delivers a kind of flexibility and adaptability that allows us to do things which are essentially impossible to do with SQL databases. I've seen a similar application as the one I'm working on and the pipeline of SQL schema scripts and messing about is scary. Dynamic changes to SQL can be time consuming and potentially dangerous. Imagine trying to change a schema on the fly while there are users online. That kind of thing is just a non issue with MongoDB.

If you are going to embrace MongoDB you should also be embracing its strengths. There is no point using it as a drop in replacement to SQL, because you can't beat SQL for being an SQL database.

But this is all backwards. I would rather say that you should challenge the orthodoxy of tight coupling between schema and binary. By smashing that dependency and embracing an approach which is more universal and flexible you are free to write applications which are more general purpose. If you accept this philosophy you will find MongoDB is a useful tool to help you on the path.

It isn't the only option by any means, but it is a pretty decent start. I made a video about all of this in 2017.

youtube.com/watch?v=uwZj4XF6zic

 

I can't help but think of my experiences with a mature GraphQL implementation. Schema first, and the binary (de)composed of many services.

There is data, the way you gather it, and the way you put it forth for consumption. I suspect they will all remain work to be done.

 

The Java GraphQL implementation assumes a fixed schema burnt into the binary; same orthodox philosophy.It took me a while to be able to generate the schema dynamically in order to expose GraphQL endpoints that would dynamically change as users modified the data structures. One of the requirements I was given was to introduce GraphQL endpoints to our system.

If it was an orthodox app this would be quite easy, you just write the classes required to get data from the model. No worries. Only if you are able to add or modify schema at runtime what do you do? It was a little fiddly, in that I had to have a trigger mechanism when the data design changes in order to programatically rebuild the GraphQL schema. Obviously this is far from ideal. Dynamic modification of a schema would be better than rebuilding.

Also built a bunch of functions to do various queries which are not really part of GraphQL syntax, but can be supported. However, to be blunt the implementation of the GraphQL library I used forced me into doing hundreds of queries to fulfill requests. A better way would be to convert the GraphQL into a single aggregation which can be run and returned.

GraphQL is another example of a technology which while helpful can drive developers into domain binding.

The problem isn't that building applications bound to a data structure is always wrong, rather that is has become so orthodox that developers don't even question it. I was one of them. I thought that using anything but SQL was insanity.

I can't disagree with any of that! I just got your point :)

 

Oh my god! This this SO MUCH THIS! This comment mirrors exactly WORD FOR WORD what I think about NoSQL vs SQL. Really, when flexibility is a requirement, you can't use SQL. That's just how it goes.

 

Now this one is a bit different than previous discussions in my opinion because one is relational and other is non-relational database so it's more dependent on the actual need of your project.

This may be a unpopular opinion but I think that MongoDB (and non-relational databases generally) are quite hyped up. So far whenever I considered using non-relational database I realized that using relational database would actually be more appropriate (which is understandable considering most of our job as web developers is dealing with the data which has a clearly defined structure).

I haven't used PostgreSQL that much (I'm more of a MySQL guy), but we can look at it as Oracle database of the open source world. It has many amazing features and is much more advanced than MySQL (for example, PostgreSQL supports indexing of JSON while MySQL doesn't).

On the other hand, if you're dealing with big amount of unstructured data, MongoDB is great. I had to do this in one project where we had various products that could have different types of attributes and it's much easier to deal with than implementing EAV in the relational database (I'm looking at you Magento) which quickly becomes a mess and breaks data integrity, removing the point of using relational database in the first place.

 

Great point. Most of people who dislike NoSQL is because they went on full to replace "relational" dbs. They should never be treated as replacement or alternative but complementary. This means you don't need 1 type of DB. It should be a resource limitation not technological one. Like you don't have enough people to maintain more than 1 DB system or it's too expensive to do so. I have project depending on 3 different DBs for now and it might get bigger. Graph DB or more speciffically Neo4j in that system cannot be replaced by SQL in terms of ease of use and actuall logic of data structure. However users, tickets, clients... are stored in PostgreSQL while geo data is in Mongo. No single reason is there to move everything to "relational" not even performance as it's quite fast this way. Geo service and postionig are indipendent of graph data. If one goes down other is usable and if both are up traffic from one doesn't pressure the other. They all depend on users which are accessed one per auth so SQL is not that much active. I know PostgreSQL has support for JSON and geo stuff but I took me 10 mins to setup Mongo and it works why would I bother with others?

Maybe now people realise it's not DB type fault it's you who pick it for wrong purposes.

 

You have PostgreSQL and don't use PostGIS to store geo data, isn't that a crime ? ^^
Anyway you are right, I work with both, there are things I would never do with mongo (like complex geo-calcul with hundreds of gigabytes of geo-data, I understand that if you have needs that are only selection of points in an area, you don't really need PotsGIS), and other I would never do with postgres (typically storing big amount of ad-hoc data with ad-hoc structure for specific clients display (lots of geo-data for thousand of users, associated for exemple with unpredictable data that will be added later), or specific usage in general).
Also, you can always use a DB to do things it's not really meant for, like you can unlog table in postgres to improve performance (but you loose the D of ACID), or there is recent improvements in mongo that adds transaction (to add the A), but in the end, you should never try to find the unique or the perfect DB, they are meant for a purpose, and without falling in the opposite (like you said, too mush DB to maintain) you shouldn't try to fit a round in a square.
In the end, to me, it's not "MongoDB VS PostgreSQL" but "MongoDB WITH PotgreSQL", because I think they are the perfect combination for most of complete projects today.

 

If just some attributes of your data is instructed, in PostgreSQL, you can put that into a jsonb column. It can be indexed and relatively easily queried. I use that for example when one of models receives a webhook with some more or less random stuff inside.

 

IMHO, Mongo was overhyped when it came out - but that doesn't mean it's without merit. The problem (as with soooo many tech solutions) is that the fanboys started touting it as a complete replacement for RDBMS's. And the "old guard" wrinkled their noses cuz they still wanted to see RDBMS's as always superior to flat file structures.

Anytime I hear someone touting either solution as The One True Answer, my only reply is:

What are the strengths (and weaknesses) of an RDBMS? What are the strengths (and weaknesses) of the entire family of NoSQL products? And, most importantly, how do the needs of your specific app map to these strengths and weaknesses?

If you can't outline at least some of the strengths/weaknesses of either approach, then I really don't want to hear any of your recommendations for the DB solution that should be used in this app.

If some guy wants to build my house, and the only tool in his tool belt is a hammer, I will NOT be employing him. I don't care how long he's used the hammer. I don't care if he can tell me 100 reasons why his hammer is The Best Tool EVER! If he can't explain to me when it might be appropriate to use a screwdriver, I don't want him anywhere near my foundation.

 

Hi Adam! that's an understandable approach. The only fallacy is that you're also implying that any dev can argue about any other tool from self experience (and not just by having read articles or discussed it with other people). Also, I'm quite sure I can find a carpenter who doesn't know all the tools available out there but can still build a life lasting house.

In our case: why not a graph DB then? Or a time series DB? Or an object DB? Or a flat file?

Any dev that has read a bit of the main differences of either approach could give you a reasonable answer, but is that sufficient for your evaluation?

In one case logic is enough ("I need to relate exams with students, therefor I'm using a DB that has relations"), in the other case experience will be never enough because only a small subset of developers have enough experience on all types of DBs to be able to evaluate each of them in each case a priori.

Does this make sense :D ?

 

The only fallacy is that you're also implying that any dev can argue about any other tool from self experience (and not just by having read articles or discussed it with other people).

That... wasn't what I was implying at all. Nor was it stated as such in my original reply.

Nevertheless, I don't understand how what you've described is somehow a "fallacy". Why shouldn't "any dev" be able to "argue about any other tool from self experience"??? Arguing in favor of a tool/tech/approach/etc !== forcing the implementation of that approach. When the team is in a research/decision phase, there's nothing at all wrong with someone proposing a solution based upon their own experience. It doesn't mean their proposal will be adopted - but neither does it mean that there's anything wrong with making the proposal.

Also, I'm quite sure I can find a carpenter who doesn't know all the tools available out there but can still build a life lasting house.

You just completely twisted my words. Please look at my original reply and tell me where I said that the carpenter must "know all the tools available out there"?? I'm talking about the fact that farrrrr too many people in tech glom onto a single approach at the exclusion of all others.

You don't have to know "all the tools". None of us do. But if the only tool that you continually, blindly, stubbornly force into every project is a hammer, then you're no carpenter. You're a hammer salesman.

In our case: why not a graph DB then? Or a time series DB? Or an object DB? Or a flat file?

Indeed. Why not?? If you're throwing these labels out there as fringe examples that simply don't warrant consideration, then you are highlighting the problem that I was originally alluding to.

Any dev that has read a bit of the main differences of either approach could give you a reasonable answer

No. Most of them cannot (or, more accurately, will not). Your statement sounds very reasonable - if you've never actually had to deal with developers before. There are far too many devs who learned a given tech/tool/approach/whatever - probably many years ago - and now they refuse to properly assess any alternatives. Maybe they can give you a cursory, 30-second explanation of Tech A vs Tech B, but even in that brief synopsis, they make it clear that they love Tech A and they've never seriously considered Tech B.

In one case logic is enough ("I need to relate exams with students, therefore I'm using a DB that has relations"), in the other case experience will be never enough because only a small subset of developers have enough experience on all types of DBs to be able to evaluate each of them in each case a priori. Does this make sense :D ?

No. This shouldn't be a case of "logic vs. experience". We should never throw out logic due to past experiences. Nor should we throw out experience due to some academic maxim of logic.

Sorry, it definitely wasn't my intention to attribute you words and meaning, it's what I understood by reading your comment. That said:

When the team is in a research/decision phase, there's nothing at all wrong with someone proposing a solution based upon their own experience. It doesn't mean their proposal will be adopted - but neither does it mean that there's anything wrong with making the proposal.

That's not what I meant as well. I'm trying to say that most people are not reasonably knowledgeable enough and also don't have enough time to test all potential options for every single decisions to make a perfectly informed choice. That's why we rely on the collective experience. I can't test all databases every time I need to choose one, but I can have a generic idea of which type I might need based on personal or collective experience.

That was my point.

But if the only tool that you continually, blindly, stubbornly force into every project is a hammer, then you're no carpenter. You're a hammer salesman.

Sure! But choosing a DB is not the same thing, right? I often think engineering or carpentry analogies fall short when compared to software (the prefix "soft" in the word is very apt).

Let's see it like this: building a house is a collective effort which requires all sort of tools in all cases, one could prefer a type of hammer or another, but they still require one. They also require construction materials, welding, scaffolding and other stuff. A carpenter going around the construction site saying "please weld with my hammer" wouldn't make sense anyway and they probably wouldn't have a job :D

Choosing a database only implies knowing which data model better fits the application one want to build and a bit of forward looking. Teams can still choose wrongly and it can cost the company a lot, but changing a database is not the same as redoing a house because nobody used bricks in the first place.

That's why I think civil engineering analogies often fall shorts when directly compared to software development. Same with the evergreen tendency of comparing building a bridge to software architecture -_-

Indeed. Why not?? If you're throwing these labels out there as fringe examples that simply don't warrant consideration, then you are highlighting the problem that I was originally alluding to.

Because of time constraint. The same argument would apply to programming languages, libraries and everything else. You'd be in forever constant comparison cycle and never get anything done.

No. Most of them cannot (or, more accurately, will not). Your statement sounds very reasonable - if you've never actually had to deal with developers before. There are far too many devs who learned a given tech/tool/approach/whatever - probably many years ago - and now they refuse to properly assess any alternatives.

Sure and I agree with you here but that's a character trait, they can still be right sometimes? At least once ;-) (after all the hammer is still the right tool in some cases).

No. This shouldn't be a case of "logic vs. experience". We should never throw out logic due to past experiences. Nor should we throw out experience due to some academic maxim of logic.

No, as I implied, maybe incorrectly, you need both at the same time. That's why the camp "always use a DBMS forever and ever" is wrong but also "spend 3 months evaluating all possible options for every tool" is also not always practical due to time or cost contraints

 

I wish I had time to write up a complete post for this, but I just want to say that I worked somewhere where we used mongoDB as the only database, for all our microservices. I don't recall any major issues on account of that decision... at least not related to mongoDB itself, but we did have to revisit our code for handling database results a few times to improve performance there.

Mongo was easier to work with in nodejs than postgresql was (at the time anyway, maybe there are better libs for it now). Also since it was microservices there weren't many places where we needed to do database joins anyway. Mongoose made it pretty easy and fun to work with. As far as performance, on launch day we had over 100k visitors with no issues. We regularly had large traffic spikes and handled them well.

 

Yeah, Mongo really latched on to Node and ensured really good tooling. I think that was the biggest factor in its success.

 

Great! Isolated small services where your data layer is mediated by the API is definitely a good use case for document DBs 🔥

 

Somehow, almost nobody remembers that when it came out, MongoDB's main premise was that MySQL wasn't scalable, and MongoDB was. It was a big fat lie. (It was just like Svelte's creator complaining about React all the time.)

Yes, there are cases where storing/consuming unstructured data is a plus. Except, like in all dualities, both approaches have pros and cons.

It's the same with structured vs unstructed, server-side rendered vs client-side rendered, interpreted vs compiled, distributed vs centralized, object-oriented vs functional, and this vs that vs the other. There is no silver spoon whichever approach you choose.

All SQL databases and noSQL databases have their strengths and weaknesses.

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

 

I recently found out about this "PostgREST" project, which provides RESTful API to Postgres database.

It seems that they do actually pull off the "fetch resources with related resources using a single request", reminded me of your wish!

postgrest.org/en/v7.0.0/api.html#r...

 

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

Isn't this a SQL JOIN you're describing? If you're talking about multi data store you can do that with foreign data wrappers and have something foreign to PostgreSQL appear local, or if you're talking multi system you can do it with an API proxy like Kong or with any GraphQL server.

So yeah, it's possible :-)

 

Yes, SQL JOIN; except, it returns a single row for every parent record.

If by a "multi data store", you mean being able to merge data from multiple sources (something like one part from MySQL and some other part from PostgreSQL), that's not what I mean either.

As for a GraphQL server, if I'm not mistaken, a graphql server fetches data from the source then formats it to the required specification. What I'm looking for is something like a GraphQL server embedded in the database / data store.

Let me try to write a pseudo query and a sample output for what I mean:
select students[id, firstname, lastname, student_no] as root, current_courses[course_name, course_code]
FROM students
MERGE students_courses on (students_courses.student_id = students.id)
MERGE courses as current_courses on (students_courses.course_id=courses.id)
WHERE student_no=1234

And the result
{id:1, firstname:Necmettin, lastname:Begiter, student_no:199601010725, courses:[{course_name:"Comperative Linguistics", course_code:"ECC101"}, {course_name:"Psychology of Learning", course_code:"ESL101"}, {course_name:"Something something", course_code:"ESS101"}]

A few things to note:

  1. This is a single query merging multiple records from different tables.
  2. The result is a single record, with fields id, firstname, lastname, student_no, and courses, with the courses field being an array of objects.
  3. Students, courses, and students_courses are separate tables/indexes/schemas/whatever.
  4. MongoDB has something similar, calling it DBRefs, but the client application must perform a separate query to fetch the related records.
  5. We all know that nested data fetching is a very complex issue. Even Firestore notes these limitations (firebase.google.com/docs/firestore...) regarding nested data fetching.
  6. With our current database solutions (including Firestore, MongoDB, ElasticSearch), if you include the students' courses in the students' table, you have two possible paths:
  • You can store the courses' IDs as an array field in students data, in which case you have to perform a separate query to fetch those courses after you fetch the students you want.
  • You can store the entire current_courses info in students table, instead of only IDs; in which case, if you ever need to update a course, you have to perform a separate query to update current_courses data that are inside the students table/database/schema/whatever.

Long story short, yes, what I'm describing IS a JOIN operation at heart, but much more than that.

This is a single query merging multiple records from different tables.

I'm not sure we're talking about different things though. I think it's a matter of perspective. An API that queries multiple data sources and then returns them to you as a result of a single query is to you, the caller, a single query.

PostgreSQL and other DBs splits queries in multiple parallel fetches sometimes, they physically make multiple reads at the same time, but does it matter to you? No, because with you issue one SQL query to get all the data you need.

Moving on to your example, you can already do it:

PracticalDeveloper_development> select count(*) from comments where user_id = 10;
-[ RECORD 1 ]-------------------------
count | 3
SELECT 1
Time: 0.008s
PracticalDeveloper_development> select users.id, users.username, array_agg(row_to_json(row)) as comments from (select c.* from comments c where c.user_id = 10) row, users group b
 y users.id having users.id = 10;

-[ RECORD 1 ]-------------------------
id       | 10
username | lockmanjerry
comments | ['{"id":17,"ancestry":null,"body_html":null,"body_markdown":"You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.","commentable_id":24,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.028987","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"h","markdown_character_count":131,"positive_reactions_count":0,"processed_html":"<p>You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.673251","user_id":10}', '{"id":19,"ancestry":null,"body_html":null,"body_markdown":"Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.","commentable_id":21,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.218676","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"j","markdown_character_count":104,"positive_reactions_count":0,"processed_html":"<p>Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.883918","user_id":10}', '{"id":15,"ancestry":null,"body_html":null,"body_markdown":"Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.","commentable_id":3,"commentable_type":"Article","created_at":"2020-06-22T11:18:55.850519","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"f","markdown_character_count":236,"positive_reactions_count":0,"processed_html":"<p>Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":3,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.444569","user_id":10}']
SELECT 1
Time: 0.008s

User 10 has 3 comments, I selected them all with one query and aggregated them in an array of JSON rows.

I'm sure the query can be simplified with a CTE or other clever tricks but it's a decent start.

I can understand why it's a little bit more complicated to do with noSQL DBs. They usually have more complicated ways of putting data in relation to other data.

Yes, multiple queries resulting in a single resultset is, in the end, a single call for the client, but on the server side, it means multiple calls. You are right of course, but I'm a little unforgiving I guess :)

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

What can I say, I'm a grumpy old programmer, it's not easy to satisfy me. ;)

Cheers.

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

Not really though: JSON in PostgreSQL is just JSON:

PracticalDeveloper_development> select id, username, email_public from users limit 1;
+------+-----------------+----------------+
| id   | username        | email_public   |
|------+-----------------+----------------|
| 6    | rosenbaumladawn | False          |
+------+-----------------+----------------+
SELECT 1
Time: 0.016s
PracticalDeveloper_development> select row_to_json(t) from (select id, username, email_public from users limit 1) t;
+------------------------------------------------------------+
| row_to_json                                                |
|------------------------------------------------------------|
| {"id":6,"username":"rosenbaumladawn","email_public":false} |
+------------------------------------------------------------+
SELECT 1
Time: 0.015s

Integers are integers, strings are strings and booleans are booleans

When PostgreSQL came up with the JSON features, I remember reading examples and always seeing 'some-json-formatted-data'::JSON, so I assumed JSON data is given to / received from PostgreSQL in a string. Even the comments field in your example a few comments back (the one with the 3 comments of user 10) has the JSON data in strings.

If that isn't the case, my bad, I didn't know PostgreSQL had progressed this far, I thought it was still using strings to encapsulate/represent JSON data contained in cells.

 

Good News!
Your dream database is already here and it's taking off as we speak. It's called FaunaDB. Join to FaunaDB's slack and see what's it all about. You'll thank me later🙂

 

I will probably have to spend some more time in the documentation, but I was unable to find an example of what I've been describing. How would you write the pseudo query I provided (a very quick example with multiple assumptions is more than enough)?

 
  • Free tiet with hosting for MongoDB is generous with Atlas. PostGRES? You have to host yourself. There is Heroku PostGRES free, but it is much more limited (in size) than Atlas.
  • MongoDB joins can be damn slow
  • MongoDB can be much easier to write programmatically, due to JSON-like syntax
  • Transaction is something new in MongoDB (recently added); and there is no save point.
  • SQL is better for constraints, honestly.
  • SQL needs an ORM, if you have JSON field. Not the case for NoSQL.
  • Raw SQL needs more learning curve, when without ORM/ODM, like JDBC alone. MongoDB is still quite easy even without ODM.
  • Foreign key hooks are still problematic.

If you are corporate, SQL is indeed a more solid solution, but not sure about dev experience.

 

SQL needs an ORM, if you have JSON field. Not the case for NoSQL.

PostgreSQL and MySQL both support JSON fields with operators.

 

I know, but not sure about drivers' output, for example Node.js's pg.

Also, not probably not used much, is MongoDB 's capacity to cleaning data in JSON. It is harder in SQLite' s JSON1.

From what I have seen, PostGRES has different JSON querying syntax, but haven't tried yet.

From what I have seen, PostGRES has different JSON querying syntax, but haven't tried yet.

True, but PostgreSQL 12 added the standard SQL/JSON Path language.

So that can be portable as well :-)

My favorite place to check the status of things is Modern SQL

 

Raw SQL needs more learning curve,

I have seen this feeling a lot when talking about NoSQL vs Relational, that SQL is hard.

I don't quite understand why.

Of course, a bunch of denormalized JSON data will always be easier, but I don't think basic SQL can be that hard for beginners.

When you start doing nested subqueries, JOINS across many tables or even aggregations I can understand. Even someone with experience can sometimes get confused, but for most applications, you won´t need that complexity.

Even the language itself is more or less natural "Select this fields from this table",

It´s like start doing aggregates and other more advanced stuff in Mongo. The complexity increases.

 

I more recently know that there is NoSQL injection as well

Example please?

 
 

I recently switched from Postgres to Mongo (which I am using for the first time) to store data that I am feeding into a few Python ML analyses. I switched to Mongo because I felt like the data requirements from stakeholders kept evolving, and it was a nightmare to re-organize the tables in Postgres to adapt.

If I wanted more long term storage for others to access I think Postgres would be the way to go because it is less accommodating of one off changes. I might prefer Postgres if I had more time, but I am bouncing between stakeholders, doing the data engineering and the analysis on my own so I need something that just works without thinking about it.

 

I switched to Mongo because I felt like the data requirements from stakeholders kept evolving, and it was a nightmare to re-organize the tables in Postgres to adapt.

This is the perfect use case! It happened to me as well. When stakeholders have no idea of what they actually want, MongoDB or similar alternatives is a perfect solution. :-D

 

Postgres! I love postgres because it has nice support for a rich JSON data type and so could be used like a json store (which imo is a bad idea for most domains you try to model), and then you can evolve into a true relational data schema as your app grows and the domain reveals itself. Going full mongo would require a very expensive migration once the domain model reveals itself to the designer/architect/engineers.

 

SQL is well known battle tested for many things. However, if you have a specific project which requires more dynamic approach regarding data or if you prototype, MongoDB offeres more at least for now. Many cases are that you need embedded objest or arrays inside your data and not relationships. Now my case for using it is GeoJSON data. PostgreSQL has a lot to offer but Mongo gave me fire and forget mode which just works. And I do queries where I postion users inside buildings represented as GeoJSON. Gotta say its fast. I still preffer PostgreSQL for users and such.

 

I think post the addition of JSONB to PostgreSQL (and the even more recent addition of full text search for json fields) there are fewer reasons to choose MongoDB on day one for most web apps. As now one can combine structured data to unstructured data, the flexibility of PostgreSQL has increased quite a lot. The syntax isn't great, but you can use functions to improve it a bit.

That doesn't mean that MongoDB doesn't have its place, but I've read (and heard one or two from first account) so many bad or even horror stories about it that I'm starting to think there is some thruth to them. It's also true that many of these stories tend to be applicable to older versions of MongoDB, so they might not still be relevant.

Still, having to lose data because of the DB and not because of a mistake made by the app developers is quite bad.

PostgreSQL isn't perfect but 25 years of open source DBMS count for something. It's rock solid. Joining data is highly optimized, it can be done in parallel, you can use foreign data wrappers to import all sort of data and use its basic pub/sub technology to spit out data automatically. The industry around it it's quite big and recently Microsoft acquired Citus which is a well known sharding solution to overcome the PostgreSQL limits on scaling out.

The only few times I had to use MongoDB were for internal prototypes, in situations where nobody wanted to think about the schema of the data and start iterating so we choose something that flexible. It's probably bit of a common strategy if you work in agency which churns out apps one after the other. In none of those cases the final app ended up in MongoDB though, they were all converted to either MySQL or PostgreSQL, depending on the client.

 

I think MongoDB became so popular because most of web development/full-stack courses in udemy/eduonix and so on use it to attract more beginners to enroll it... That's how it start confusing most of Beginner developer

 

Honestly I've never had to do too much with MongoDB. So I can't comment too much on perf, programability etc.

I do know that I've made SQL databases do outrageous things under massively parallel write loads. Billion row tables, before cloud was a thing etc. Tweaking Inserts so they'd take place on different pages for example. It was fun. Execution plans are fun. Thinking about the right kind of indexes is fun. ACID is legitimate a lot of the time. Seeing a query speed up by a factor of 100+ is fun.

Extensible schemas on top of systems that do that, e.g. via GraphQL or similar, is also fun. So there's value in both concepts, and implementations.

 

Use the Right Tool for the JOB

As of now NoSQL database like Mongo DB cannot fully replace Relational Database.

Use NoSQL database for Unstructured data like : Blog Post, Logs, Docs etc.

Find out in which use case Mongo DB can help to make your application faster for reads.

Example for Blog:

Reading Article: Mongo DB - To Read Fast.
Writing or Saving Article: Relational DB and feed also to Mongo DB
Searching Articles: Use Elastic Search.

 

As mostly, I would say it depends. However, I'd agree with the tenor that for most software applications a relational database (Postgres as my current favourite) might be best suited. If my data is relational in nature, I use a relational database for reasons many commenters already stated. Even though NoSQL might translate to 'Not only SQL', from my experience it is best suited for data models with no or at most very few relations.

Having said this, I do see very valid use cases for NoSQL databases and in some cases would prefer MongoDB. An example I was working with is sensor data. A good data model with MongoDB lets you build optimized solutions for your use case, with trade-offs between read/write operations based on your application requirements. You could look for more specialized (e.g. time-series-) databases, however, these might very well be overkill depending on your needs to store/ingest data. Even for an application with a chat functionality I currently consider using MongoDB based on this article: blog.discord.com/how-discord-store... While at some point Cassandra might be a good solution, I never expect to come close to the amount of data being stored (fingers crossed for the opposite).

For the use cases above, Redis also might be very well suited. Though I still prefer MongoDB here, with Redis as a way to optimize the performance, even more with Lua scripting. Having said this, I am actually building an application using PostgresSQL/MongoDB/Redis together. While this might sound like quite an amount of complexity, all three databases are very easy and transparent to work with (once you get to a certain level). They all have their specialized use cases where at the current stage they work best for me. Secondly, with Heroku/Atlas/Redis you get all those in a free tier, with easy and cheap ways to scale once you need.

 

Relational for me, but with most data stored in JSON structures in it and projected out for indexing if necessary. Works well. Flexible data model. Fast joins. Piped to elastic for rapid cross searching. I use MySQL rather than PostgreSQL though. I like expressing queries as JSON in the front end and the node servers - but that's easily mapped to SQL and SQL is still there if it is hard to express some much more complicated principle.

 

I have been working with mongodb for a long time in different projects but one of the most powerful project i have built is an e commerce and deliver application thar contains about 100000 product, 1500 category and about 2000 store, and about 11 million custom price document in the database, we are working with geo locations heavily to find nearest stores, nearest drivers, driver tracking and so on, I found that mongodb is very good choice for most of projects, it enqbles you to store the data with the right way, it enables you to make relational documents and also gives you the ability to store data deeply without relations or creating separate collection or table, mongodb has many features have been added in the latest releases, one of these features is aggregation pipelines, which enables you to make different operations not just joins which has solved the problem of relations that relational databases depend on, rather than that.

 

I used both, both mostly with Java. I don't think it makes that much of a difference. Complex queries are hard either way. Data migrations have challenges in both. And not having an explicit schema in Mongo can speed things up at first, but also causing problems later when you want to change it.

 

Yup, i think MongoDB comes from using it for the data science field. Whereas Postgres is for a general use case point of view I guess. As from what I know, people who use MongoDB uses it for Data Science work than your typical use case like a normal web application.

 

I sort of agree. But also, having had to work with both PostGRES and Mongo in the same company, on the same product, as a Data Scientist, my experience was more: "Do I actually have to go to the Mongo DB in order to get [business report]? Can't we work around it by: [getting data from api/event bus]".

The reason is that most data science work relies on relations between data. We're not looking to find the 'one record that fits these criteria', we're trying to encapsulate the full scope of nearly all the data. I don't think I've ever done a job without needing 50-99% of all the records in a specific table in relation to 1-10 others. Doing that kind of work on Mongo is sort of a pain.

The other aspect is most data science processes focus on matrix like objects. Python tends towards Pandas/ numpy arrays for object representation, which are relatively analogous to a table in a database, and R has this kind of feature built into the base language with data.frames. Most ML will assume you pass it table like records.

There may be an argument that human written, free text based NLP might be a good fit. However, you will probably start an NLP still with tables of the frequency of the word turning up like in this project by by buddy Dom where he predicts Gross price of theatre tickets by doing NLP on the reviews of different shows

 

MongoDB as write / eventstore
CouchDB as distributed mobile database
PostgreSQL for general purpose application
Elasticsearch for full text search

When using NoSQL make sure you don't need any relation to the data e.g: post, article, log, comment, notification.
Otherwise, switch to SQL.

MongoDB joins are slow because they're not indexed (CMIIW tho). I used MongoDB for production and avoid joins at all cost.

Flexibility of query in SQL can sometimes be a bad things where engineers design bad relation and write slow complicated query.
Used postgres for a year and I like it tho, especially the JSON data type. The schema system is a bit weird compared to MS SQL.

 

pgsql + jsonb + indexes = mongodb on steroids... without the mongodb part😜

 

And now you can also have indexes on JSONB expressions 🔥

 

I pretty much agree with the statement that "There is nothing Relational cannot do over a nonrelational DB".

But I was working in a company which owns a famous Property Portal where we sell custom made portals based variety of features.
These Features include, search on any field in the entity or database.

If we chose Relational DB for this, creating indexes and taking peculiar measures would be much expensive and harder. This is why we went for Elasticsearch.

 

I agree with most of the answers here saying relational dB is best compared to no sql

I still wonder how discord is able to perform well with no sql db(Cassandra)

 

What do you mean by "well" anyway?

  • I am quite sure it is "enough".
  • If there no problem, don't fix it.
  • Not even sure SQL is easier to maintain / scale than NoSQL. NoSQL was built to address these kind of problems, but I am not sure they succeeded.
 

Question: What are some zero-friction cloud hosting services for PostgreSQL, similar to MongoDB Atlas/mlab?
(Please share your experience if any.)

 
 

If you quickly need something to work, spin up a docker image. No brainer.

 

If you know how to use. Even if it is bamboo, it is unrivaled.