Rap2h

Posted on Jun 28, 2022

SQL tips for beginners; what I learned after 10+ years of occasionally writing queries and schemas as a web dev

Rap2h · 2022-06-28T20:56:27Z

Index nearly everything, index early . Not indexing results in poor performances. Indexing too late can create downtime or inconsistent data. Do not consider removing/avoiding index to preserve disk space, buy more space instead. Remember that index may have an impact on write performance though. PostgreSQL is your best choice , it has a skilled community, including StackOverflow heroes . There is no MySQL, MariaDB, or SQLite user group where I live. ORM are great for simple CRUD operation . Don't use it to query specific stats or any complex query, use raw queries. BTW, for TS/JS devs, Prisma 3 is cool, Sequelize is obsolete. SQL syntax is case insensitive, use lower case everywhere (and underscores), do not name your tables with camelCase. ALSO, YOU DON'T HAVE TO YELL. select id from users is the same as SELECT id FROM users . For string comparison, case sensitivity depends on collation . Never miss a foreign key . Implicit links and relations between tables should be avoided. Follow a pattern when naming tables and columns . Example: users , posts , comments (plural). post_user , comment_post , comment_user for junction tables (singular + alphabetical order). id for primary key, id_user , id_comment , and id_post (singular + id first) for foreign key. Reading a database schema is hard for newcomers. Be super consistent when naming tables. 1 model ≠ 1 table . Unfortunately, when designing a database you have to think at a lower level than "models" and "objects". You will have to deal with technical tables. Use the right data type . Do not create "custom" data types defined by an implicit contract. JSONB is great but don't overuse it. Also, learn to query it it's complex yet powerful . When you iterate a lot on designing a schema, you can use JSONB to avoid modifying the schema over and over. PostgreSQL does not scale automatically. Expect migrations and downtime. CTE is more readable than subqueries , you can read it from the top to the bottom. Also, you can understand the way it was designed. Keep tables schema simple , avoid patterns such as EAV that makes tables hard to understand and read; try to keep most tables as simple as vertical CSV sheets, with one line for each record. Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient. The order of the words in a query is not the order in which they are evaluated. Know the order of execution : from, where, group by, having, order by, select, limit. 120 tables are way too much , even for big products. You can build a robust application with 25 tables. Thank you for reading 🙏

#postgres #webdev #programming #sql

Index nearly everything, index early. Not indexing results in poor performances. Indexing too late can create downtime or inconsistent data. Do not consider removing/avoiding index to preserve disk space, buy more space instead. Remember that index may have an impact on write performance though.
PostgreSQL is your best choice, it has a skilled community, including StackOverflow heroes. There is no MySQL, MariaDB, or SQLite user group where I live.
ORM are great for simple CRUD operation. Don't use it to query specific stats or any complex query, use raw queries. BTW, for TS/JS devs, Prisma 3 is cool, Sequelize is obsolete.
SQL syntax is case insensitive, use lower case everywhere (and underscores), do not name your tables with camelCase. ALSO, YOU DON'T HAVE TO YELL. select id from users is the same as SELECT id FROM users. For string comparison, case sensitivity depends on collation.
Never miss a foreign key. Implicit links and relations between tables should be avoided.
Follow a pattern when naming tables and columns. Example: users, posts, comments (plural). post_user, comment_post, comment_user for junction tables (singular + alphabetical order). id for primary key, id_user, id_comment, and id_post (singular + id first) for foreign key. Reading a database schema is hard for newcomers. Be super consistent when naming tables.
1 model ≠ 1 table. Unfortunately, when designing a database you have to think at a lower level than "models" and "objects". You will have to deal with technical tables.
Use the right data type. Do not create "custom" data types defined by an implicit contract.
JSONB is great but don't overuse it. Also, learn to query it it's complex yet powerful. When you iterate a lot on designing a schema, you can use JSONB to avoid modifying the schema over and over.
PostgreSQL does not scale automatically. Expect migrations and downtime.
CTE is more readable than subqueries, you can read it from the top to the bottom. Also, you can understand the way it was designed.
Keep tables schema simple, avoid patterns such as EAV that makes tables hard to understand and read; try to keep most tables as simple as vertical CSV sheets, with one line for each record.
Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.
The order of the words in a query is not the order in which they are evaluated. Know the order of execution: from, where, group by, having, order by, select, limit.
120 tables are way too much, even for big products. You can build a robust application with 25 tables.

Thank you for reading 🙏

Top comments (32)

HAP • Jul 8 '22

Index nearly everything, index early

Don't go nuts with this. Too many indexes can slow down writes.

Also:

When things get complex (and they often will) remember that you can create views and perform reads (and sometimes writes) with those views with a model/SQL.
Be careful with cascade actions. This can make a seemingly simple action incredibly long-running. If you have header tables linked to detail tables and your detail tables have millions of records in them or cascade to other tables that do, I encourage you to use soft deletes at the header-level instead. Once that is done, you can determine if and when to actually delete the data.
If you have queries that have to perform multiple aggregations against a large set of data, you may want to investigate partitioning.
Don't be afraid of VACUUM operations. It is often better to have more frequent, shorter vacuum executions vs fewer, longer ones.
For more modern versions of PostgreSQL, investigate your server specs (the server itself, that is: memory CPU, etc) if you have enough, explore increasing some work memory and adding more parallel workers.
Check out using the nice data types PostgreSQL provides such as range types. Those can be used with special indexing to provide complex operations like range overlap and in some cases nearest neighbor, if I remember correctly.
If you are writing SQL that performs a manipulating operation (INSERT, UPDATE, DELETE), you can use the RETURNING clause to return the (full or partial) rows that were affected.
Think of doing you operations in batches where possible. This can be more efficient when processing large sets of data.
When writing your code, commit at the highest level possible. Don't commit in a low-level loop, unless you really like your code to run slow and be I/O intensive.
Don't forget you can cast certain types to others with a double colon (::). Ex:
```
select integer_col::text from table;
```
Understand that sometimes anti-join patterns using EXISTS or NOT EXISTS with a subquery can sometime execute faster that a LEFT/RIGHT JOIN where col is null.

-- 40-year DB developer

Rap2h • Jul 12 '22

Wow! Thank you so much for your answer! Your comment is actually a gold mine and contains original content and really useful thoughts! It should be an actual article by itself.

I agree with all points. The last point in particular saved my dev life more than once: it could be considered a detail but actually, it is a game-changer! (and it's not well-known)

Let me know if you write an article with your contents, I will be glad to share it (or review it before publication if you want)! It could be a "SQL tips for ~~beginners~~ intermediate" :D

Thank you!

Arnaud Ambroselli • Jun 29 '22

1 model ≠ 1 table. Unfortunately, when designing a database you have to think at a lower level than "models" and "objects". You will have to deal with technical tables.

Being almost a newbie in SQL, I don’t understand the subtility here… can you tell a bit more what you mean ?

Rap2h • Jun 29 '22 • Edited

Thank you for your feedback! The most obvious case is junction tables (n..m relationship), i.e: users have n groups, groups contain m users, then you need a group_user table (a.k.a a junction table). This kind of table represents a relation between two models, not a model itself.

Toomas Jaska • Jul 4 '22

Aah. This threw me off thinking you mean one model ought not to be constrained in 1 table. What you really meant is every table isn't necessarily a representation of a model.

Nathan Stevenson • Jun 29 '22 • Edited

In regards to technical tables when using models with an ORM, do you mean helper tables? Like if there could be a relationship between 2 entities of the same model (i.e. a forum or chat app, 2 users having a friend relationship). Or perhaps other tables that contain additional details related to a particular model?

If not, could you elaborate on that a bit, or suggest a source I should look into myself?

A good read. I'm currently working on a project where I am creating models and have implemented one of the examples I mentioned above (a friends helper table with a primary key made up of the foreign keys from each user, and a date created field to represent friends in a chat app).

I am constantly amazed by how much I have learned, in contrast with how much I'll never be able to learn. There's just so many facets to development and related disciplines.

Rap2h • Jun 29 '22

Thank you for your feedback! 🙏

In regards to technical tables when using models with an ORM, do you mean helper tables? Like if there could be a relationship between 2 entities of the same model (i.e. a forum or chat app, 2 users having a friend relationship). Or perhaps other tables that contain additional details related to a particular model?

That's it, I mean both!

The most obvious case is pivot tables (n..m relationship), i.e: users have n groups, groups contain m users, then you need a group_user table (a.k.a a pivot table). This kind of table represents a relation between two models, not a model itself.

There can also be 1..1 tables (a subset of properties of a model split in another table). Or shared properties between two models that does not need to be represented themselves (maybe addresses for users and companies, when address is not considered a model in application, just a property).

I am constantly amazed by how much I have learned, in contrast with how much I'll never be able to learn. There's just so many facets to development and related disciplines.

Thanks for sharing! That's dizzy and that's what I love in dev! I still consider I'm a beginner (let's say intermediate-beginner) in SQL and I'm always amazed by thing I discover via other peoples' work.

taijidude • Jun 29 '22

Thank you for the post. I need to go and have another look at ctes.

Rap2h • Jun 29 '22

Thank you for your feedback. Yes, CTE is a game changer in writing queries. I sometimes have 4 or 5 parts broken down before the actual query, and it's still easy to read. Side note, I worked with a junior colleague who easily wrote super complex query in a readable way thanks to this syntax.

I feel like the example is documentation is good enough to highlight the benefits: postgresql.org/docs/current/querie...

Also, this course helped me: masterywithsql.com/ (still it's not free)

taijidude • Jun 30 '22

Thanks for mentioning it. Before i had the false assumption, that CTES are only usefull if i wanted to do a recursive query.

Aaron Reese • Jun 28 '22

Great article and I agree with most of your points.

SQL syntax is case insensitive

This depends on Collation. Some are case sensitive also can also affect where nulls appear in an index
+1 for choose a naming convention and stick to it. I prefer singular table names and table.id for primary key and table.othertable_id for foreign key but I am database centric. If you use an ORM then puralusied table names (to match the model) and table.table_id makes auto-linking easier
15 tables for an application? Depends on the app. Make sure you follow 3NF and use as many tables as necessary to model the business problems. Yes avoid EAV tables but also avoid sparsely populated columns. Use a core table and extension tables (1:0..1 mapping) for additional data that is not required in many cases. E.g. main table is product. Product_wine has same key and strength, terroire vineyard, releaseDate, dryness, ABV. Product_book has same key and author, pageCount, publisher

Rap2h • Jun 29 '22

Thank you for your detailed and useful answer!!

Most of my points are opinions, still I should have been less assertive and mention this. I will edit my article thanks to your comment for things that are not clear or should be corrected. About the "case insensitive syntax" I wanted to talk about select, Select, SELECT and tables names, not text comparison. As far as I know, collation has only impact for strings not the syntax of the query itself, but maybe I missed something.

Aaron Reese • Jun 29 '22

yes key words are case insensitive. As for shouty-case I am OCD about code layout and for me readablity is improved by keywords and built in functions being UPPERCASE along with proper indentation of fields in the SELECT and tables in the WHERE...JOIN clauses.

Rap2h • Jun 29 '22

That's a valid point, thank you for sharing! I was not aware of this issue (people sometimes ignore issues when they are not concerned), this should be considered indeed! Out of curiosity, how do you deal with language that does not encourage shouting? Is syntax highlighting helping in some way?

Again you pointed out an opinion: I feel like it's harder to READ TEXT IN UPPERCASE, but it may be only a matter of taste and personal comfort. I was ignoring real issues.

I should dig into why in the first place uppercase was used for SQL. There should be a reason!

Again thanks you for your feedback!

Bert • Jul 1 '22

SQL is a 'very' old language (early '70s if I'm not mistaken). Terminals at the time only used capital letters. The manuals reflected that and it became a historical norm. It was never a choice nor mandatory to have them capitalized.

Hung Luong • Jul 1 '22

This could be just a personal quirk but I'd really prefer SQL keywords and functions to be uppercase, while table and column names as lowercase as that help me to easily distinguish the boilerplate from actual data

HAP • Jul 8 '22

I always used spacing for that. I find that it's easier (for me) to keep case changes to a minimum.
I prefer writing statements in this fashion:

select col,
       col2, 
       col3
  from table
  join table_2
    on table_2.col = table.col
 where table.col2 = 'value'
   and table.col3 > other_value;

This way I always have my actions on the left and my columns, values, expressions on the right.

But that's just me, possibly. I find it neat and clean and easier to focus on instead of case changes confusing my attention.

Sanam • Jul 5 '22 • Edited

Let me correct some of these,

Use uppercase for the sql syntax (this is standard practice in dba world)
Use 25 tables instead of 120 is conflicting, Database should be normalizes and its always good to split the SOC table wise and play with id. This point basically disagree the database architecture pattern.

Rap2h • Jul 12 '22 • Edited

Thank you for your feedback! (for first point, I guess case is also a matter of opinions: dev.to/redhap/comment/203hn)

Sanam • Sep 22 '22

Yes it is. Everyone have own style of formatting and preferences :)

AISuperHub • Jun 29 '22

Thank you so much for the sql tips specifically on the PostgreSQL ones. I recently did some research and wrote an article to compare PostgreSQL and MySQL syntax. Sharing here to further improve knowledge.

tipseason.com/postgres-vs-mysql-sy...

Rap2h • Jun 29 '22

Hi @tipseason! Thank you for your feedback and the link to the article! (bookmarked ✨)

AISuperHub • Jun 29 '22

Thanks Raphael.

Costin Manda • Jul 2 '22

I have to say that I don't agree with many of your points.

Indexing is a subtle operation that needs understanding of what and why one does it. The space and write performance costs are important enough to invalidate your assumption on indexing everything.
I would rather invite database designers to separate the tables that require reading from those that are heavily written in and construct some mechanism for synchronizing them. Indexing for reading is very different from indexing for writing and table structure is very different for reading and writing as well.

Also... PostgreSQL being the best. What trolling :) I actually loved PostgreSQL 20 years ago, but then I've worked on SQL Server almost exclusively. It's amazing how much stuff they have added every year to it. I expect most other engines have had a similar spectacular growth. Again, here I would say that the engine should fit your requirements.

About SQL being case insensitive in keywords or the styling... I find that irrelevant. There are tools that can change the SQL code style automatically. Basically you can write whatever you want and then shape it to fit... your company's code style. I know people who come into a company where everyone writes in one way and they stubbornly write in another way because "this is the way". They are just annoying, not right.

I do agree with ORMs being a shortcut to simple things and when you want something done right or specific you need to "get your hands dirty".
I personally would go even so far as to say ORMs are harmful. Because even if one follows coding guidelines and abstract the DAL correctly, maintaining code that handles data in multiple ways is difficult and messy. If ORMs are good only for simple things, that means that SQL queries for the same effect would be simple. Therefore there is no need to use one.

Thanks for sharing your viewpoint and inviting a discussion on them.

Rap2h • Jul 12 '22

Thank you for your detailled feedback! I'm glad you don't agree with most of my points since it helps me reviewing my thoughts, so thank you!

As a disclaimer, this article has (at least) two big flaws:

It is mostly opinions (and I'm too assertive in my points)
It is advices for beginners (so when I write "do this" it's not that simple in a real world where each situation is different).

Here is a try to answer all your points!

Indexing is a subtle operation

You are right! This is the most posted comment I received both here and on reddit, so I guess I'm wrong. I wrote this because beginners (in my experience) often forget to index things, so it was something like "hey, don't forget to index obvious things you query all the day".

PostgreSQL being the best. What trolling :)

Yes, sorry about that. I should have write: "as a beginner who don't know what SQL DBMS to choose, PostgreSQL could be a good candidate".

About SQL being case insensitive

I have no strong opinion on this. When I use Datagrip, and when columns and table names have upper case, it does not auto-correct, so it's annoying. Your point is valid: a new dev that would come and says "let's change all our habits because mine are better" would be ... uh... annoying I guess!

I do agree with ORMs being a shortcut to simple things

That's an interesting point!

Again, thank you for your feedback, I hope readers will see it too. BTW, I recommand the feedback by @redhap : dev.to/redhap/comment/203hk which is like yours: a deeper questioning about SQL querying than mine!

Sergey Lukin • Jul 2 '22

Agree with all points with a small note on "Don't be afraid of writing long and verbose queries to address a specific need, it's often more efficient.": in my experience, having small one-thing-per-query SELECTs in the code leads to much more robust performance at scale as caching is more explicit and leads to less stress on DB overall. My 2 cents.

Rap2h • Jul 12 '22

Thank you for your feedback!

We may have a different use case! Maybe my use case is more for things like specific stats (let say when a kind of a big introspection in database is required to only display a chart by month for the last 6 months). I agree with you for other cases: sometimes it's better to write a first query, then pass the resulting ids to a second query.

View full discussion (32 comments)