Discussion on: PostgreSQL vs MongoDB

View post

Somehow, almost nobody remembers that when it came out, MongoDB's main premise was that MySQL wasn't scalable, and MongoDB was. It was a big fat lie. (It was just like Svelte's creator complaining about React all the time.)

Yes, there are cases where storing/consuming unstructured data is a plus. Except, like in all dualities, both approaches have pros and cons.

It's the same with structured vs unstructed, server-side rendered vs client-side rendered, interpreted vs compiled, distributed vs centralized, object-oriented vs functional, and this vs that vs the other. There is no silver spoon whichever approach you choose.

All SQL databases and noSQL databases have their strengths and weaknesses.

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

rhymes • Jun 24 '20

I have one wish though: Being able to fetch the main records/documents and related records or documents with a single query. That'll be the day the database industry changes. (Already done maybe? Since I am an old guy that can't be bothered to follow every new shiny thing anymore :)

Isn't this a SQL JOIN you're describing? If you're talking about multi data store you can do that with foreign data wrappers and have something foreign to PostgreSQL appear local, or if you're talking multi system you can do it with an API proxy like Kong or with any GraphQL server.

So yeah, it's possible :-)

Necmettin Begiter • Jun 24 '20

Yes, SQL JOIN; except, it returns a single row for every parent record.

If by a "multi data store", you mean being able to merge data from multiple sources (something like one part from MySQL and some other part from PostgreSQL), that's not what I mean either.

As for a GraphQL server, if I'm not mistaken, a graphql server fetches data from the source then formats it to the required specification. What I'm looking for is something like a GraphQL server embedded in the database / data store.

Let me try to write a pseudo query and a sample output for what I mean:
select students[id, firstname, lastname, student_no] as root, current_courses[course_name, course_code] FROM students MERGE students_courses on (students_courses.student_id = students.id) MERGE courses as current_courses on (students_courses.course_id=courses.id) WHERE student_no=1234

And the result
{id:1, firstname:Necmettin, lastname:Begiter, student_no:199601010725, courses:[{course_name:"Comperative Linguistics", course_code:"ECC101"}, {course_name:"Psychology of Learning", course_code:"ESL101"}, {course_name:"Something something", course_code:"ESS101"}]

A few things to note:

This is a single query merging multiple records from different tables.
The result is a single record, with fields id, firstname, lastname, student_no, and courses, with the courses field being an array of objects.
Students, courses, and students_courses are separate tables/indexes/schemas/whatever.
MongoDB has something similar, calling it DBRefs, but the client application must perform a separate query to fetch the related records.
We all know that nested data fetching is a very complex issue. Even Firestore notes these limitations (firebase.google.com/docs/firestore...) regarding nested data fetching.
With our current database solutions (including Firestore, MongoDB, ElasticSearch), if you include the students' courses in the students' table, you have two possible paths:

You can store the courses' IDs as an array field in students data, in which case you have to perform a separate query to fetch those courses after you fetch the students you want.
You can store the entire current_courses info in students table, instead of only IDs; in which case, if you ever need to update a course, you have to perform a separate query to update current_courses data that are inside the students table/database/schema/whatever.

Long story short, yes, what I'm describing IS a JOIN operation at heart, but much more than that.

rhymes • Jun 25 '20

This is a single query merging multiple records from different tables.

I'm not sure we're talking about different things though. I think it's a matter of perspective. An API that queries multiple data sources and then returns them to you as a result of a single query is to you, the caller, a single query.

PostgreSQL and other DBs splits queries in multiple parallel fetches sometimes, they physically make multiple reads at the same time, but does it matter to you? No, because with you issue one SQL query to get all the data you need.

Moving on to your example, you can already do it:

PracticalDeveloper_development> select count(*) from comments where user_id = 10;
-[ RECORD 1 ]-------------------------
count | 3
SELECT 1
Time: 0.008s
PracticalDeveloper_development> select users.id, users.username, array_agg(row_to_json(row)) as comments from (select c.* from comments c where c.user_id = 10) row, users group b
 y users.id having users.id = 10;

-[ RECORD 1 ]-------------------------
id       | 10
username | lockmanjerry
comments | ['{"id":17,"ancestry":null,"body_html":null,"body_markdown":"You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.","commentable_id":24,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.028987","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"h","markdown_character_count":131,"positive_reactions_count":0,"processed_html":"<p>You probably haven\'t heard of them chartreuse direct trade. Humblebrag photo booth marfa. Gluten-free goth flannel 90\'s 8-bit vice.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.673251","user_id":10}', '{"id":19,"ancestry":null,"body_html":null,"body_markdown":"Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.","commentable_id":21,"commentable_type":"Article","created_at":"2020-06-22T11:18:56.218676","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"j","markdown_character_count":104,"positive_reactions_count":0,"processed_html":"<p>Sriracha shoreditch pitchfork offal selfies. Knausgaard godard pabst fixie microdosing mixtape meggings.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":1,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.883918","user_id":10}', '{"id":15,"ancestry":null,"body_html":null,"body_markdown":"Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.","commentable_id":3,"commentable_type":"Article","created_at":"2020-06-22T11:18:55.850519","deleted":false,"edited":false,"edited_at":null,"hidden_by_commentable_user":false,"id_code":"f","markdown_character_count":236,"positive_reactions_count":0,"processed_html":"<p>Humblebrag austin wayfarers wes anderson cardigan celiac organic pork belly. Irony butcher tacos pork belly normcore retro whatever you probably haven\'t heard of them. Small batch meh etsy pork belly williamsburg sartorial schlitz yolo.</p>\\n\\n","public_reactions_count":1,"reactions_count":0,"receive_notifications":true,"score":3,"spaminess_rating":0,"updated_at":"2020-06-22T11:34:28.444569","user_id":10}']
SELECT 1
Time: 0.008s

User 10 has 3 comments, I selected them all with one query and aggregated them in an array of JSON rows.

I'm sure the query can be simplified with a CTE or other clever tricks but it's a decent start.

I can understand why it's a little bit more complicated to do with noSQL DBs. They usually have more complicated ways of putting data in relation to other data.

Necmettin Begiter • Jun 25 '20

Yes, multiple queries resulting in a single resultset is, in the end, a single call for the client, but on the server side, it means multiple calls. You are right of course, but I'm a little unforgiving I guess :)

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

What can I say, I'm a grumpy old programmer, it's not easy to satisfy me. ;)

Cheers.

rhymes • Jun 25 '20

Yeah, PostgreSQL almost got it, except JSON in Postgres are strings. Almost there :)

Not really though: JSON in PostgreSQL is just JSON:

PracticalDeveloper_development> select id, username, email_public from users limit 1;
+------+-----------------+----------------+
| id   | username        | email_public   |
|------+-----------------+----------------|
| 6    | rosenbaumladawn | False          |
+------+-----------------+----------------+
SELECT 1
Time: 0.016s
PracticalDeveloper_development> select row_to_json(t) from (select id, username, email_public from users limit 1) t;
+------------------------------------------------------------+
| row_to_json                                                |
|------------------------------------------------------------|
| {"id":6,"username":"rosenbaumladawn","email_public":false} |
+------------------------------------------------------------+
SELECT 1
Time: 0.015s

Integers are integers, strings are strings and booleans are booleans

Necmettin Begiter • Jun 26 '20

When PostgreSQL came up with the JSON features, I remember reading examples and always seeing 'some-json-formatted-data'::JSON, so I assumed JSON data is given to / received from PostgreSQL in a string. Even the comments field in your example a few comments back (the one with the 3 comments of user 10) has the JSON data in strings.

If that isn't the case, my bad, I didn't know PostgreSQL had progressed this far, I thought it was still using strings to encapsulate/represent JSON data contained in cells.

Valts Liepiņš • Aug 21 '20

I recently found out about this "PostgREST" project, which provides RESTful API to Postgres database.

It seems that they do actually pull off the "fetch resources with related resources using a single request", reminded me of your wish!

postgrest.org/en/v7.0.0/api.html#r...

Yaron Levi • Jun 25 '20

Good News!
Your dream database is already here and it's taking off as we speak. It's called FaunaDB. Join to FaunaDB's slack and see what's it all about. You'll thank me later🙂

Necmettin Begiter • Jun 26 '20

I will probably have to spend some more time in the documentation, but I was unable to find an example of what I've been describing. How would you write the pseudo query I provided (a very quick example with multiple assumptions is more than enough)?