- Index nearly everything, index early. Not indexing results in poor performances. Indexing too late can create downtime or inconsistent data. Do not consider removing/avoiding index to preserve disk space, buy more space instead. Remember that index may have an impact on write performance though.
PostgreSQL is your best choice, it has a skilled community, including StackOverflow heroes. There is no MySQL, MariaDB, or SQLite user group where I live.
ORM are great for simple CRUD operation. Don't use it to query specific stats or any complex query, use raw queries. BTW, for TS/JS devs, Prisma 3 is cool, Sequelize is obsolete.
SQL syntax is case insensitive, use lower case everywhere (and underscores), do not name your tables with camelCase. ALSO, YOU DON'T HAVE TO YELL.
select id from usersis the same asSELECT id FROM users. For string comparison, case sensitivity depends on collation.Never miss a foreign key. Implicit links and relations between tables should be avoided.
Follow a pattern when naming tables and columns. Example:
users,posts,comments(plural).post_user,comment_post,comment_userfor junction tables (singular + alphabetical order).idfor primary key,id_user,id_comment, andid_post(singular + id first) for foreign key. Reading a database schema is hard for newcomers. Be super consistent when naming tables.1 model ≠ 1 table. Unfortunately, when designing a database you have to think at a lower level than "models" and "objects". You will have to deal with technical tables.
Use the right data type. Do not create "custom" data types defined by an implicit contract.
JSONB is great but don't overuse it. Also, learn to query it it's complex yet powerful. When you iterate a lot on designing a schema, you can use JSONB to avoid modifying the schema over and over.
PostgreSQL does not scale automatically. Expect migrations and downtime.
CTE is more readable than subqueries, you can read it from the top to the bottom. Also, you can understand the way it was designed.
Keep tables schema simple, avoid patterns such as EAV that makes tables hard to understand and read; try to keep most tables as simple as vertical CSV sheets, with one line for each record.
Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.
The order of the words in a query is not the order in which they are evaluated. Know the order of execution: from, where, group by, having, order by, select, limit.
120 tables are way too much, even for big products. You can build a robust application with 25 tables.
Thank you for reading 🙏
Latest comments (32)
Newbie here. Thanks for this!
Don't go nuts with this. Too many indexes can slow down writes.
Also:
Don't forget you can cast certain types to others with a double colon (
::). Ex:Understand that sometimes anti-join patterns using EXISTS or NOT EXISTS with a subquery can sometime execute faster that a LEFT/RIGHT JOIN where col is null.
-- 40-year DB developer
Wow! Thank you so much for your answer! Your comment is actually a gold mine and contains original content and really useful thoughts! It should be an actual article by itself.
I agree with all points. The last point in particular saved my dev life more than once: it could be considered a detail but actually, it is a game-changer! (and it's not well-known)
Let me know if you write an article with your contents, I will be glad to share it (or review it before publication if you want)! It could be a "SQL tips for
beginnersintermediate" :DThank you!
Great article, agree on many things!
Regarding indexing I would be at bit wary though when to add them. Most DB's has queryplans and such to determine when to use an index, guess it is based on rowcount and cardinality and other things.
Also composite indexes are worth mentioning,
i.e an index of (columnA, columnB) might be used for
WHERE columnA = value AND columnB = value
and also:
WHERE columnA = value
Thank you for your feedback.
I guess you are right about index (you are not alone, many comments are similar, here and on reddit). Thank you for the precision about composite indexes!
Let me correct some of these,
Thank you for your feedback! (for first point, I guess case is also a matter of opinions: dev.to/redhap/comment/203hn)
Yes it is. Everyone have own style of formatting and preferences :)
Agree with all points with a small note on "Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.": in my experience, having small one-thing-per-query SELECTs in the code leads to much more robust performance at scale as caching is more explicit and leads to less stress on DB overall. My 2 cents.
Thank you for your feedback!
We may have a different use case! Maybe my use case is more for things like specific stats (let say when a kind of a big introspection in database is required to only display a chart by month for the last 6 months). I agree with you for other cases: sometimes it's better to write a first query, then pass the resulting ids to a second query.
I have to say that I don't agree with many of your points.
Indexing is a subtle operation that needs understanding of what and why one does it. The space and write performance costs are important enough to invalidate your assumption on indexing everything.
I would rather invite database designers to separate the tables that require reading from those that are heavily written in and construct some mechanism for synchronizing them. Indexing for reading is very different from indexing for writing and table structure is very different for reading and writing as well.
Also... PostgreSQL being the best. What trolling :) I actually loved PostgreSQL 20 years ago, but then I've worked on SQL Server almost exclusively. It's amazing how much stuff they have added every year to it. I expect most other engines have had a similar spectacular growth. Again, here I would say that the engine should fit your requirements.
About SQL being case insensitive in keywords or the styling... I find that irrelevant. There are tools that can change the SQL code style automatically. Basically you can write whatever you want and then shape it to fit... your company's code style. I know people who come into a company where everyone writes in one way and they stubbornly write in another way because "this is the way". They are just annoying, not right.
I do agree with ORMs being a shortcut to simple things and when you want something done right or specific you need to "get your hands dirty".
I personally would go even so far as to say ORMs are harmful. Because even if one follows coding guidelines and abstract the DAL correctly, maintaining code that handles data in multiple ways is difficult and messy. If ORMs are good only for simple things, that means that SQL queries for the same effect would be simple. Therefore there is no need to use one.
Thanks for sharing your viewpoint and inviting a discussion on them.
Thank you for your detailled feedback! I'm glad you don't agree with most of my points since it helps me reviewing my thoughts, so thank you!
As a disclaimer, this article has (at least) two big flaws:
Here is a try to answer all your points!
You are right! This is the most posted comment I received both here and on reddit, so I guess I'm wrong. I wrote this because beginners (in my experience) often forget to index things, so it was something like "hey, don't forget to index obvious things you query all the day".
Yes, sorry about that. I should have write: "as a beginner who don't know what SQL DBMS to choose, PostgreSQL could be a good candidate".
I have no strong opinion on this. When I use Datagrip, and when columns and table names have upper case, it does not auto-correct, so it's annoying. Your point is valid: a new dev that would come and says "let's change all our habits because mine are better" would be ... uh... annoying I guess!
That's an interesting point!
Again, thank you for your feedback, I hope readers will see it too. BTW, I recommand the feedback by @redhap : dev.to/redhap/comment/203hk which is like yours: a deeper questioning about SQL querying than mine!
This could be just a personal quirk but I'd really prefer SQL keywords and functions to be uppercase, while table and column names as lowercase as that help me to easily distinguish the boilerplate from actual data
I always used spacing for that. I find that it's easier (for me) to keep case changes to a minimum.
I prefer writing statements in this fashion:
This way I always have my actions on the left and my columns, values, expressions on the right.
But that's just me, possibly. I find it neat and clean and easier to focus on instead of case changes confusing my attention.
Thanks for writing this post. I agree with almost all of the statements, except 3rd one. Could you explain in more detail, why using an ORM are not recommended in big projects? In my opinion, it helps to save code clean and readable, even helps to reuse redundant code compared to raw SQL queries.
It's not for "big projects", it's for complex queries. It's hard to write complex query with query builder, and an optimized query is really sometimes 1K more rapid than 4 queries made to the ORM via its objects or a weird query built by the ORM via its query builder. You should still use your ORM on big projects, but for simple CRUD operations:
Users.findById(123)is fine!Thank you so much for the sql tips specifically on the PostgreSQL ones. I recently did some research and wrote an article to compare PostgreSQL and MySQL syntax. Sharing here to further improve knowledge.
tipseason.com/postgres-vs-mysql-sy...
Hi @tipseason! Thank you for your feedback and the link to the article! (bookmarked ✨)
Thanks Raphael.
Being almost a newbie in SQL, I don’t understand the subtility here… can you tell a bit more what you mean ?
Aah. This threw me off thinking you mean one model ought not to be constrained in 1 table. What you really meant is every table isn't necessarily a representation of a model.
Thank you for your feedback! The most obvious case is junction tables (n..m relationship), i.e:
usershave ngroups,groupscontain musers, then you need agroup_usertable (a.k.a a junction table). This kind of table represents a relation between two models, not a model itself.