In the company I work for, we use Postgres, one time while we creating a simple chat app, I suggest that we join messages table
with user table
so that we can get the message author name
. But my boss say it bad, he say that join with big table slow everything down, the more data has to join the slower it get.
I only learn basic stuff about RMDBS like join,inner join, etc... . So I really don't understand, why my boss say that. What I learn from school is different, when you design RMBDS model normalized
you have to use join, it normal stuff.
The model is like this:
Message {
author_name,
body
}
User {
user_name
}
I really need some help, I am confusing right now, Thank you
Top comments (7)
RDBMS(s) are designed to perform better with joins, there are many ways to improve performance. Your boss is neither wrong, nor right, joins do slow down query but that doesn't mean you shouldn't use it. There are various ways to improve joins, create indexes and create indexed views/materialized views.
If
User
table has only one field then your boss is right, but ifUser
table has more fields such aslast_updated, full_name, country, time_zone
.. you cannot put these fields inMessages
table as it will unnecessary cause lots of duplication and updating all messages to just updatelast_updated
would be slowest thing to do in database.This is a good and simple explanation!
Joins are integral to how relational databases work - they're built on the assumption that you will join tables together, and so they will generally do them efficiently. It's true to say that there is a cost to joining two tables, but it's generally in milliseconds, so not worth worrying about. Doing without joins would mean you lose most of the benefits of a relational database in the first place.
In the event you do have a slow join, then avoiding joins is not the way to resolve it - it's likely under those circumstances that adding an index will resolve the issue. I personally have seen this reduce a response time from 22 seconds to less than 2 seconds. And if all else fails, consider caching the results of the query.
Normalization is about the relationship between entities.
Joins is about using that relationship to narrow down the result when querying.
You could still have a normalized database, but perform queries without using joins for extra speed.
You will get it, but at a cost elsewhere. (e.g you have to write code in your application that puts data from multiple queries together).
Personally, I would try all possible optimization technics first (wiki.postgresql.org/wiki/Performan...) before avoiding joins. There's a lot of them. Also, high-performance hardware should be considered too.
I see, that make sense, thank you
Everything the guys said is true but I don't know why none of them mentioned that maybe you can try another kind of databases for the chatting. I mean most people would use NoSQL database to represent your tables, have you consider that?
ahh yes I do, but the chatting is build for an existing app, and we store all user info in Postgres already, I don't want to scatter data, so we just use what we have :)