Ben Halpern

Posted on May 28, 2020 • Edited on Aug 26, 2020

PostgreSQL vs MongoDB

#postgres #mongodb #healthydebate #database

A continuation of this series. The discussions have been great so far!

I think these two represent two factions in the database world that make for a good compare and contrast. But feel free to broaden the discussion to other databases.

Top comments (71)

Peter Harrison • May 29 '20 • Edited

Long time user of SQL of various varieties and PostgreSQL specifically. There was a time when the idea of using anything but a SQL database was heretical. And to be frank some of the comments below indicate that using non SQL data storage is still considered heterodox. The orthodox thinking is on clear display when Noah says "one of the most important parts of designing a system is fleshing out your data model".

In this statement there is the core weakness of orthodox software development, mainly that the foundation stone of our software is the model. The model is usually established by the SQL and object model, with a mapping using Hibernate or whatever other object relational mapper. when you bind the data structure to the binary artifacts you cripple flexibility. You will build business rules into the domain objects. Changing the data structure becomes almost impossible without management of both binary deployment and schema changes. I spent decades dealing with this kind of problem from even before SQL. I was a dBase programmer back in the early ninties.

PostgreSQL is my favorite SQL database. But it may not suit everyones needs. A telco I worked for used MySQL. There are workloads where the data structures are mature and unchanging, where flexibility and adaptability isn't critically important. Where tight coupling between code and schema is acceptable.

In 2013 I began working on a Automation Engine which used a data storage mechanism which was flexible. It actually used PostgereSQL under the covers, but it allowed huge flexibility that decoupled the code from schema. Later this technology adopted MongoDB under the covers. However it became clear to me that using MongoDB directly would give me all the same features with far better performance.

If you simply treat MongoDB as any other relational database and build applications tightly coupled with a schema it is like complaining that your trail bike doesn't go as fast as your Ferrari or carry as much as your pickup truck. The who point of MongoDB is the flexibility. If you are simply going to build applications like you did with SQL and expect it to be feature for feature identical you are missing the benefits.

The door MongoDB opens is the ability to write applications that are not tightly coupled to the binary. You can store arbitrary data easily and allow users to define what data they want to store at run time rather than design time. It delivers a kind of flexibility and adaptability that allows us to do things which are essentially impossible to do with SQL databases. I've seen a similar application as the one I'm working on and the pipeline of SQL schema scripts and messing about is scary. Dynamic changes to SQL can be time consuming and potentially dangerous. Imagine trying to change a schema on the fly while there are users online. That kind of thing is just a non issue with MongoDB.

If you are going to embrace MongoDB you should also be embracing its strengths. There is no point using it as a drop in replacement to SQL, because you can't beat SQL for being an SQL database.

But this is all backwards. I would rather say that you should challenge the orthodoxy of tight coupling between schema and binary. By smashing that dependency and embracing an approach which is more universal and flexible you are free to write applications which are more general purpose. If you accept this philosophy you will find MongoDB is a useful tool to help you on the path.

It isn't the only option by any means, but it is a pretty decent start. I made a video about all of this in 2017.

youtube.com/watch?v=uwZj4XF6zic

Daragh Byrne • May 29 '20 • Edited

I can't help but think of my experiences with a mature GraphQL implementation. Schema first, and the binary (de)composed of many services.

There is data, the way you gather it, and the way you put it forth for consumption. I suspect they will all remain work to be done.

Peter Harrison • Jun 1 '20

The Java GraphQL implementation assumes a fixed schema burnt into the binary; same orthodox philosophy.It took me a while to be able to generate the schema dynamically in order to expose GraphQL endpoints that would dynamically change as users modified the data structures. One of the requirements I was given was to introduce GraphQL endpoints to our system.

If it was an orthodox app this would be quite easy, you just write the classes required to get data from the model. No worries. Only if you are able to add or modify schema at runtime what do you do? It was a little fiddly, in that I had to have a trigger mechanism when the data design changes in order to programatically rebuild the GraphQL schema. Obviously this is far from ideal. Dynamic modification of a schema would be better than rebuilding.

Also built a bunch of functions to do various queries which are not really part of GraphQL syntax, but can be supported. However, to be blunt the implementation of the GraphQL library I used forced me into doing hundreds of queries to fulfill requests. A better way would be to convert the GraphQL into a single aggregation which can be run and returned.

GraphQL is another example of a technology which while helpful can drive developers into domain binding.

The problem isn't that building applications bound to a data structure is always wrong, rather that is has become so orthodox that developers don't even question it. I was one of them. I thought that using anything but SQL was insanity.

Daragh Byrne • Jun 1 '20

I can't disagree with any of that! I just got your point :)

Дмитрий • Dec 8 '22

Thank you very much for your opinion!
I totally agree with you. Most developers, that complaint against MongoDB are just treating as it is another relative database. I saw a lot of articles, complaining about $lookup, just because they've normalized their data .-.
MongoDB is a good joice for any small or mid-sized start-up, because they require a lot of changes and a really small amount of time.

giovanni-or2 • May 30 '20

Oh my god! This this SO MUCH THIS! This comment mirrors exactly WORD FOR WORD what I think about NoSQL vs SQL. Really, when flexibility is a requirement, you can't use SQL. That's just how it goes.

Casey Brooks • May 28 '20 • Edited

100% this.

So many devs (including myself) got burned when MongoDB came out because it was touted as a silver-bullet data store, when in reality it was created for an entirely different problem than what most projects needed. By the time devs realized MongoDB was causing pain like major performance bottlenecks and difficulties accessing the data they needed, they were already in too deep to fix the issue.

Real-world data is relational, pretty much always. While it's pretty common for single records to have hierarchical data (which is what makes non-relational DBs sound appealing), it's very uncommon for entire datasets to be limited to a bunch of singular records of purely hierarchical data. You will end up making multiple collections and trying to reference them just like relational models, except that you'll have zero help from the database in querying them efficiently.

I'm a strong advocate that you should never start your project with a non-relational data store. It's much easier to add one later on (which will probably never happen) than it is to move from non-relational to relational.

Michiel Hendriks • May 29 '20

Not to forget that PostgreSQL supports JSON and XML as a datatype. So where suitable you can just store JSON or XML in a table.
You can even create indexes on directly on the JSON data. (Or you can use index functions like you can do with XML data.)

Nikola Stojaković • May 28 '20 • Edited

Now this one is a bit different than previous discussions in my opinion because one is relational and other is non-relational database so it's more dependent on the actual need of your project.

This may be a unpopular opinion but I think that MongoDB (and non-relational databases generally) are quite hyped up. So far whenever I considered using non-relational database I realized that using relational database would actually be more appropriate (which is understandable considering most of our job as web developers is dealing with the data which has a clearly defined structure).

I haven't used PostgreSQL that much (I'm more of a MySQL guy), but we can look at it as Oracle database of the open source world. It has many amazing features and is much more advanced than MySQL (for example, PostgreSQL supports indexing of JSON while MySQL doesn't).

On the other hand, if you're dealing with big amount of unstructured data, MongoDB is great. I had to do this in one project where we had various products that could have different types of attributes and it's much easier to deal with than implementing EAV in the relational database (I'm looking at you Magento) which quickly becomes a mess and breaks data integrity, removing the point of using relational database in the first place.

HS • May 29 '20

Great point. Most of people who dislike NoSQL is because they went on full to replace "relational" dbs. They should never be treated as replacement or alternative but complementary. This means you don't need 1 type of DB. It should be a resource limitation not technological one. Like you don't have enough people to maintain more than 1 DB system or it's too expensive to do so. I have project depending on 3 different DBs for now and it might get bigger. Graph DB or more speciffically Neo4j in that system cannot be replaced by SQL in terms of ease of use and actuall logic of data structure. However users, tickets, clients... are stored in PostgreSQL while geo data is in Mongo. No single reason is there to move everything to "relational" not even performance as it's quite fast this way. Geo service and postionig are indipendent of graph data. If one goes down other is usable and if both are up traffic from one doesn't pressure the other. They all depend on users which are accessed one per auth so SQL is not that much active. I know PostgreSQL has support for JSON and geo stuff but I took me 10 mins to setup Mongo and it works why would I bother with others?

Maybe now people realise it's not DB type fault it's you who pick it for wrong purposes.

n1b0r63 • May 29 '20

You have PostgreSQL and don't use PostGIS to store geo data, isn't that a crime ? ^^
Anyway you are right, I work with both, there are things I would never do with mongo (like complex geo-calcul with hundreds of gigabytes of geo-data, I understand that if you have needs that are only selection of points in an area, you don't really need PotsGIS), and other I would never do with postgres (typically storing big amount of ad-hoc data with ad-hoc structure for specific clients display (lots of geo-data for thousand of users, associated for exemple with unpredictable data that will be added later), or specific usage in general).
Also, you can always use a DB to do things it's not really meant for, like you can unlog table in postgres to improve performance (but you loose the D of ACID), or there is recent improvements in mongo that adds transaction (to add the A), but in the end, you should never try to find the unique or the perfect DB, they are meant for a purpose, and without falling in the opposite (like you said, too mush DB to maintain) you shouldn't try to fit a round in a square.
In the end, to me, it's not "MongoDB VS PostgreSQL" but "MongoDB WITH PotgreSQL", because I think they are the perfect combination for most of complete projects today.

Lud • May 29 '20

If just some attributes of your data is instructed, in PostgreSQL, you can put that into a jsonb column. It can be indexed and relatively easily queried. I use that for example when one of models receives a webhook with some more or less random stuff inside.

Adam Nathaniel Davis • May 29 '20

IMHO, Mongo was overhyped when it came out - but that doesn't mean it's without merit. The problem (as with soooo many tech solutions) is that the fanboys started touting it as a complete replacement for RDBMS's. And the "old guard" wrinkled their noses cuz they still wanted to see RDBMS's as always superior to flat file structures.

Anytime I hear someone touting either solution as The One True Answer, my only reply is:

What are the strengths (and weaknesses) of an RDBMS? What are the strengths (and weaknesses) of the entire family of NoSQL products? And, most importantly, how do the needs of your specific app map to these strengths and weaknesses?

If you can't outline at least some of the strengths/weaknesses of either approach, then I really don't want to hear any of your recommendations for the DB solution that should be used in this app.

If some guy wants to build my house, and the only tool in his tool belt is a hammer, I will NOT be employing him. I don't care how long he's used the hammer. I don't care if he can tell me 100 reasons why his hammer is The Best Tool EVER! If he can't explain to me when it might be appropriate to use a screwdriver, I don't want him anywhere near my foundation.

rhymes • May 29 '20

Hi Adam! that's an understandable approach. The only fallacy is that you're also implying that any dev can argue about any other tool from self experience (and not just by having read articles or discussed it with other people). Also, I'm quite sure I can find a carpenter who doesn't know all the tools available out there but can still build a life lasting house.

In our case: why not a graph DB then? Or a time series DB? Or an object DB? Or a flat file?

Any dev that has read a bit of the main differences of either approach could give you a reasonable answer, but is that sufficient for your evaluation?

In one case logic is enough ("I need to relate exams with students, therefor I'm using a DB that has relations"), in the other case experience will be never enough because only a small subset of developers have enough experience on all types of DBs to be able to evaluate each of them in each case a priori.

Does this make sense :D ?

Adam Nathaniel Davis • May 29 '20

The only fallacy is that you're also implying that any dev can argue about any other tool from self experience (and not just by having read articles or discussed it with other people).

That... wasn't what I was implying at all. Nor was it stated as such in my original reply.

Nevertheless, I don't understand how what you've described is somehow a "fallacy". Why shouldn't "any dev" be able to "argue about any other tool from self experience"??? Arguing in favor of a tool/tech/approach/etc !== forcing the implementation of that approach. When the team is in a research/decision phase, there's nothing at all wrong with someone proposing a solution based upon their own experience. It doesn't mean their proposal will be adopted - but neither does it mean that there's anything wrong with making the proposal.

Also, I'm quite sure I can find a carpenter who doesn't know all the tools available out there but can still build a life lasting house.

You just completely twisted my words. Please look at my original reply and tell me where I said that the carpenter must "know all the tools available out there"?? I'm talking about the fact that farrrrr too many people in tech glom onto a single approach at the exclusion of all others.

You don't have to know "all the tools". None of us do. But if the only tool that you continually, blindly, stubbornly force into every project is a hammer, then you're no carpenter. You're a hammer salesman.

In our case: why not a graph DB then? Or a time series DB? Or an object DB? Or a flat file?

Indeed. Why not?? If you're throwing these labels out there as fringe examples that simply don't warrant consideration, then you are highlighting the problem that I was originally alluding to.

Any dev that has read a bit of the main differences of either approach could give you a reasonable answer

No. Most of them cannot (or, more accurately, will not). Your statement sounds very reasonable - if you've never actually had to deal with developers before. There are far too many devs who learned a given tech/tool/approach/whatever - probably many years ago - and now they refuse to properly assess any alternatives. Maybe they can give you a cursory, 30-second explanation of Tech A vs Tech B, but even in that brief synopsis, they make it clear that they love Tech A and they've never seriously considered Tech B.

In one case logic is enough ("I need to relate exams with students, therefore I'm using a DB that has relations"), in the other case experience will be never enough because only a small subset of developers have enough experience on all types of DBs to be able to evaluate each of them in each case a priori. Does this make sense :D ?

No. This shouldn't be a case of "logic vs. experience". We should never throw out logic due to past experiences. Nor should we throw out experience due to some academic maxim of logic.

rhymes • May 29 '20

Sorry, it definitely wasn't my intention to attribute you words and meaning, it's what I understood by reading your comment. That said:

When the team is in a research/decision phase, there's nothing at all wrong with someone proposing a solution based upon their own experience. It doesn't mean their proposal will be adopted - but neither does it mean that there's anything wrong with making the proposal.

That's not what I meant as well. I'm trying to say that most people are not reasonably knowledgeable enough and also don't have enough time to test all potential options for every single decisions to make a perfectly informed choice. That's why we rely on the collective experience. I can't test all databases every time I need to choose one, but I can have a generic idea of which type I might need based on personal or collective experience.

That was my point.

But if the only tool that you continually, blindly, stubbornly force into every project is a hammer, then you're no carpenter. You're a hammer salesman.

Sure! But choosing a DB is not the same thing, right? I often think engineering or carpentry analogies fall short when compared to software (the prefix "soft" in the word is very apt).

Let's see it like this: building a house is a collective effort which requires all sort of tools in all cases, one could prefer a type of hammer or another, but they still require one. They also require construction materials, welding, scaffolding and other stuff. A carpenter going around the construction site saying "please weld with my hammer" wouldn't make sense anyway and they probably wouldn't have a job :D

Choosing a database only implies knowing which data model better fits the application one want to build and a bit of forward looking. Teams can still choose wrongly and it can cost the company a lot, but changing a database is not the same as redoing a house because nobody used bricks in the first place.

That's why I think civil engineering analogies often fall shorts when directly compared to software development. Same with the evergreen tendency of comparing building a bridge to software architecture -_-

Indeed. Why not?? If you're throwing these labels out there as fringe examples that simply don't warrant consideration, then you are highlighting the problem that I was originally alluding to.

Because of time constraint. The same argument would apply to programming languages, libraries and everything else. You'd be in forever constant comparison cycle and never get anything done.

No. Most of them cannot (or, more accurately, will not). Your statement sounds very reasonable - if you've never actually had to deal with developers before. There are far too many devs who learned a given tech/tool/approach/whatever - probably many years ago - and now they refuse to properly assess any alternatives.

Sure and I agree with you here but that's a character trait, they can still be right sometimes? At least once ;-) (after all the hammer is still the right tool in some cases).

No. This shouldn't be a case of "logic vs. experience". We should never throw out logic due to past experiences. Nor should we throw out experience due to some academic maxim of logic.

No, as I implied, maybe incorrectly, you need both at the same time. That's why the camp "always use a DBMS forever and ever" is wrong but also "spend 3 months evaluating all possible options for every tool" is also not always practical due to time or cost contraints

Necmettin Begiter • Jun 24 '20

Somehow, almost nobody remembers that when it came out, MongoDB's main premise was that MySQL wasn't scalable, and MongoDB was. It was a big fat lie. (It was just like Svelte's creator complaining about React all the time.)

Yes, there are cases where storing/consuming unstructured data is a plus. Except, like in all dualities, both approaches have pros and cons.

It's the same with structured vs unstructed, server-side rendered vs client-side rendered, interpreted vs compiled, distributed vs centralized, object-oriented vs functional, and this vs that vs the other. There is no silver spoon whichever approach you choose.

All SQL databases and noSQL databases have their strengths and weaknesses.

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

Valts Liepiņš • Aug 21 '20

I recently found out about this "PostgREST" project, which provides RESTful API to Postgres database.

It seems that they do actually pull off the "fetch resources with related resources using a single request", reminded me of your wish!

postgrest.org/en/v7.0.0/api.html#r...

rhymes • Jun 24 '20

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

Isn't this a SQL JOIN you're describing? If you're talking about multi data store you can do that with foreign data wrappers and have something foreign to PostgreSQL appear local, or if you're talking multi system you can do it with an API proxy like Kong or with any GraphQL server.

So yeah, it's possible :-)

Necmettin Begiter • Jun 24 '20

Yes, SQL JOIN; except, it returns a single row for every parent record.

If by a "multi data store", you mean being able to merge data from multiple sources (something like one part from MySQL and some other part from PostgreSQL), that's not what I mean either.

As for a GraphQL server, if I'm not mistaken, a graphql server fetches data from the source then formats it to the required specification. What I'm looking for is something like a GraphQL server embedded in the database / data store.

Let me try to write a pseudo query and a sample output for what I mean:
select students[id, firstname, lastname, student_no] as root, current_courses[course_name, course_code] FROM students MERGE students_courses on (students_courses.student_id = students.id) MERGE courses as current_courses on (students_courses.course_id=courses.id) WHERE student_no=1234

And the result
{id:1, firstname:Necmettin, lastname:Begiter, student_no:199601010725, courses:[{course_name:"Comperative Linguistics", course_code:"ECC101"}, {course_name:"Psychology of Learning", course_code:"ESL101"}, {course_name:"Something something", course_code:"ESS101"}]

A few things to note:

This is a single query merging multiple records from different tables.
The result is a single record, with fields id, firstname, lastname, student_no, and courses, with the courses field being an array of objects.
Students, courses, and students_courses are separate tables/indexes/schemas/whatever.
MongoDB has something similar, calling it DBRefs, but the client application must perform a separate query to fetch the related records.
We all know that nested data fetching is a very complex issue. Even Firestore notes these limitations (firebase.google.com/docs/firestore...) regarding nested data fetching.
With our current database solutions (including Firestore, MongoDB, ElasticSearch), if you include the students' courses in the students' table, you have two possible paths:

You can store the courses' IDs as an array field in students data, in which case you have to perform a separate query to fetch those courses after you fetch the students you want.
You can store the entire current_courses info in students table, instead of only IDs; in which case, if you ever need to update a course, you have to perform a separate query to update current_courses data that are inside the students table/database/schema/whatever.

Long story short, yes, what I'm describing IS a JOIN operation at heart, but much more than that.

rhymes • Jun 25 '20

This is a single query merging multiple records from different tables.

I'm not sure we're talking about different things though. I think it's a matter of perspective. An API that queries multiple data sources and then returns them to you as a result of a single query is to you, the caller, a single query.

PostgreSQL and other DBs splits queries in multiple parallel fetches sometimes, they physically make multiple reads at the same time, but does it matter to you? No, because with you issue one SQL query to get all the data you need.

Moving on to your example, you can already do it:

PracticalDeveloper_development> select count(*) from comments where user_id = 10;
-[ RECORD 1 ]-------------------------
count | 3
SELECT 1
Time: 0.008s
PracticalDeveloper_development> select users.id, users.username, array_agg(row_to_json(row)) as comments from (select c.* from comments c where c.user_id = 10) row, users group b
 y users.id having users.id = 10;

-[ RECORD 1 ]-------------------------
id       | 10
username | lockmanjerry
comments | ['{"id":17,"ancestry":null,"body_html":null,"body_markdown":"You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.","commentable_id":24,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.028987","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"h","markdown_character_count":131,"positive_reactions_count":0,"processed_html":"<p>You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.673251","user_id":10}', '{"id":19,"ancestry":null,"body_html":null,"body_markdown":"Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.","commentable_id":21,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.218676","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"j","markdown_character_count":104,"positive_reactions_count":0,"processed_html":"<p>Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.883918","user_id":10}', '{"id":15,"ancestry":null,"body_html":null,"body_markdown":"Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.","commentable_id":3,"commentable_type":"Article","created_at":"2020-06-22T11:18:55.850519","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"f","markdown_character_count":236,"positive_reactions_count":0,"processed_html":"<p>Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":3,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.444569","user_id":10}']
SELECT 1
Time: 0.008s

User 10 has 3 comments, I selected them all with one query and aggregated them in an array of JSON rows.

I'm sure the query can be simplified with a CTE or other clever tricks but it's a decent start.

I can understand why it's a little bit more complicated to do with noSQL DBs. They usually have more complicated ways of putting data in relation to other data.

Necmettin Begiter • Jun 25 '20

Yes, multiple queries resulting in a single resultset is, in the end, a single call for the client, but on the server side, it means multiple calls. You are right of course, but I'm a little unforgiving I guess :)

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

What can I say, I'm a grumpy old programmer, it's not easy to satisfy me. ;)

Cheers.

rhymes • Jun 25 '20

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

Not really though: JSON in PostgreSQL is just JSON:

PracticalDeveloper_development> select id, username, email_public from users limit 1;
+------+-----------------+----------------+
| id   | username        | email_public   |
|------+-----------------+----------------|
| 6    | rosenbaumladawn | False          |
+------+-----------------+----------------+
SELECT 1
Time: 0.016s
PracticalDeveloper_development> select row_to_json(t) from (select id, username, email_public from users limit 1) t;
+------------------------------------------------------------+
| row_to_json                                                |
|------------------------------------------------------------|
| {"id":6,"username":"rosenbaumladawn","email_public":false} |
+------------------------------------------------------------+
SELECT 1
Time: 0.015s

Integers are integers, strings are strings and booleans are booleans

Necmettin Begiter • Jun 26 '20

When PostgreSQL came up with the JSON features, I remember reading examples and always seeing 'some-json-formatted-data'::JSON, so I assumed JSON data is given to / received from PostgreSQL in a string. Even the comments field in your example a few comments back (the one with the 3 comments of user 10) has the JSON data in strings.

If that isn't the case, my bad, I didn't know PostgreSQL had progressed this far, I thought it was still using strings to encapsulate/represent JSON data contained in cells.

Yaron Levi • Jun 25 '20

Good News!
Your dream database is already here and it's taking off as we speak. It's called FaunaDB. Join to FaunaDB's slack and see what's it all about. You'll thank me later🙂

Necmettin Begiter • Jun 26 '20

I will probably have to spend some more time in the documentation, but I was unable to find an example of what I've been describing. How would you write the pseudo query I provided (a very quick example with multiple assumptions is more than enough)?

Marcos Filho • May 29 '20

You will regret it later if you chose the former.

Quite literally my very short experience with NoSQL. It was awesome at first, honestly, but quickly turned out to be a huge mess for the 2 (two) projects I got involved.

I'm sure I could have been involved in better NoSQL situations but so far I just didn't have that luck.

Pacharapol Withayasakpunt • May 29 '20 • Edited

Free tiet with hosting for MongoDB is generous with Atlas. PostGRES? You have to host yourself. There is Heroku PostGRES free, but it is much more limited (in size) than Atlas.
MongoDB joins can be damn slow
MongoDB can be much easier to write programmatically, due to JSON-like syntax
- This experience came from Writing an ORM for SQLite
- I more recently know that there is NoSQL injection as well
Transaction is something new in MongoDB (recently added); and there is no save point.
SQL is better for constraints, honestly.
SQL needs an ORM, if you have JSON field. Not the case for NoSQL.
Raw SQL needs more learning curve, when without ORM/ODM, like JDBC alone. MongoDB is still quite easy even without ODM.
Foreign key hooks are still problematic.

If you are corporate, SQL is indeed a more solid solution, but not sure about dev experience.

Glenn Carremans • May 29 '20 • Edited

SQL needs an ORM, if you have JSON field. Not the case for NoSQL.

PostgreSQL and MySQL both support JSON fields with operators.

Pacharapol Withayasakpunt • May 29 '20 • Edited

I know, but not sure about drivers' output, for example Node.js's pg.

Also, not probably not used much, is MongoDB 's capacity to cleaning data in JSON. It is harder in SQLite' s JSON1.

From what I have seen, PostGRES has different JSON querying syntax, but haven't tried yet.

rhymes • May 29 '20

From what I have seen, PostGRES has different JSON querying syntax, but haven't tried yet.

True, but PostgreSQL 12 added the standard SQL/JSON Path language.

So that can be portable as well :-)

My favorite place to check the status of things is Modern SQL

Bruno Paz • May 30 '20 • Edited

Raw SQL needs more learning curve,

I have seen this feeling a lot when talking about NoSQL vs Relational, that SQL is hard.

I don't quite understand why.

Of course, a bunch of denormalized JSON data will always be easier, but I don't think basic SQL can be that hard for beginners.

When you start doing nested subqueries, JOINS across many tables or even aggregations I can understand. Even someone with experience can sometimes get confused, but for most applications, you won´t need that complexity.

Even the language itself is more or less natural "Select this fields from this table",

It´s like start doing aggregates and other more advanced stuff in Mongo. The complexity increases.

bravemaster619 • May 29 '20

I more recently know that there is NoSQL injection as well

Example please?

Pacharapol Withayasakpunt • May 29 '20 • Edited

severalnines.com/database-blog/sec...

BTW, there is also GraphQL injection, both for SQL and NoSQL e.g. petecorey.com/blog/2017/06/12/grap...

katy lavallee • May 29 '20

I wish I had time to write up a complete post for this, but I just want to say that I worked somewhere where we used mongoDB as the only database, for all our microservices. I don't recall any major issues on account of that decision... at least not related to mongoDB itself, but we did have to revisit our code for handling database results a few times to improve performance there.

Mongo was easier to work with in nodejs than postgresql was (at the time anyway, maybe there are better libs for it now). Also since it was microservices there weren't many places where we needed to do database joins anyway. Mongoose made it pretty easy and fun to work with. As far as performance, on launch day we had over 100k visitors with no issues. We regularly had large traffic spikes and handled them well.

Ben Halpern • May 29 '20

Yeah, Mongo really latched on to Node and ensured really good tooling. I think that was the biggest factor in its success.

rhymes • May 29 '20

Great! Isolated small services where your data layer is mediated by the API is definitely a good use case for document DBs 🔥

Sean Allin Newell • May 29 '20

Splunk logs - logs and events are inherently non relational and perform much faster.

This does not generally apply to software, and in general, I would agree that non relational is almost never a good long term option for an app with any reasonable domain it is trying to model.

Michiel Hendriks • May 29 '20 • Edited

A big part of logs are relational. It has a timestamp, host, application, log level, a message, and commonly a "category". The other data is less well structured between the various log event producers, lets call this "meta".
This meta data you could store in a less explicitly structures relation. It's usually a simple key->value structure anyway. For that you could use PostgreSQL's hstore.

But... I do not know if I would use PostgreSQL to store log events. Although PostgreSQL has native sharding these days. Setting up a distributed PostgreSQL farm is way more complicated. Log events are a good candidate for eventually consistent. Or even, never consistent is also acceptable. It is mostly appending entries, pruning old records, and occasionally performing a query. ACID is also no strong requirement.

Sean Allin Newell • May 29 '20 • Edited

Nah, none of those fields are relational imo. Relational fields are like UserId, EventId, ParentId, etc. Pointers to other things.

If you want to build a relational logging platform be my guest, but I suspect there's a good reason why most of the big players in logging use NoSql or InfluxDB.

rhymes • May 29 '20 • Edited

I would argue that a time series DB like InfluxDB is a better storage for timestamped logs and events than a document DB is.

View full discussion (71 comments)