DEV Community

Discussion on: Example of complex SQL query to get as much data as possible from Database.

Collapse
 
dmfay profile image
Dian Fay • Edited

First: The distinction between outer (left or right) and inner joins is critical, but it's not about efficiency. Inner joins add to the result set if the row in table A has a corresponding row in table B. Rows without complements are skipped. Meanwhile, outer joins add all rows from table A (or B) to the result set and apply B (or A) data when found; if the corresponding rows don't exist, the origin rows are still returned with NULLs in place of the missing data.

Outer joins may seem slower because they're returning more data or because they're poorly indexed but performance is not a reason to prefer inner joins. It's about the shape of the data: you shouldn't just be checking to see if any employee currently has a null job_id but whether it is possible for an employee ever to have a null job_id. If that field is nullable and you are potentially interested in employees without job_ids you must use an outer join or risk incomplete results.

Second: Do what makes sense. You're trying to strike a balance between a few conflicting goals: simple queries, few roundtrips to the database, manageable resultset size. Where the balance is depends on what you're using the data for. Sometimes you really do need a huge, complicated query that returns everything imaginable because you really are using all that data all at once. Usually if you have huge amounts of data people have difficulty processing it unless you split it up for them, which in turn leads to smaller, more targeted queries. Here, you have duplicate employee rows thanks to multiple job histories, so whatever you're doing with the data will likely entail some post-processing to transform the data in a way that gives you one entry per employee. Having to process data further before it's useful is a sign that you're trying to do too much.

Third: Does it return the data you expect in a reasonable amount of time? Then it's correct enough.

Collapse
 
tyzia profile image
Andrei Navumau

Thank you, Dian.
You're right. When I was wondering about LEFT JOIN vs INNER JOIN, I was concerned not to end up with LEFT JOINS everywhere, in every query without proper understanding what I'm doing. I don't want to use LEFT JOIN just to be on the safe side, that some column in 'employees' table has NULL value in job_id field, for instance, and thus this employee is not included into overall result.

But you are right, that I need to take a look at the constraint of this field. If it is NOT NULL, so I can use INNER JOIN safely.

This, by the way, is an example when one should think about redesigning table constraints and, probably, adding NOT NULL constraints to important fields.

Thank you, once again for your time and comment. Honestly, I didn't think that anyone will read this post to the end and will grasp the idea of my questions.

Talking about the amount of data this query returns, this was a test task to see, if I'm able to join tables correctly. In real world, I'd prefer short queries which return specific data, not merging everything into one query. To my mind, it's more efficient.

Take care!