DEV Community

Cover image for SQL Joins without the Venn diagrams
Helen Anderson
Helen Anderson

Posted on • Updated on • Originally published at helenanderson.co.nz

SQL Joins without the Venn diagrams

JOINs are the first big learning curve after getting your head around SQL basics. You may not find everything you need in one table so it's important to know which JOIN clause to use to get the data you need.

While there are plenty of guides to JOINs that use Venn diagrams to get the point across, I prefer something more visual.


INNER JOIN
LEFT JOIN
CROSS JOIN
UNION
UNION ALL


INNER JOIN

Use an INNER JOIN, shortened to 'JOIN', when you want to find the match between two tables. You need to have a column on both tables that you join ON, and that's where the match happens. Any results where there is not a match are discarded.

inner


select
  o.order_item, 
  i.inventory_item
from 
  orders o
inner join 
  inventory i 
on o.order_item = i.inventory_item

Enter fullscreen mode Exit fullscreen mode

LEFT JOIN

Use a LEFT JOIN when you want to find a match between two tables, but also show a NULL where there is no match from the right table. A RIGHT JOIN does the same but in reverse.

Like the INNER JOIN, you need a column to join ON. Unlike the INNER JOIN, a NULL is used to show there is no match between the two tables.

outer

select
  o.order_item, 
  i.inventory_item 
from 
  orders o -- this is the left table
left join 
  inventory i -- this is the right table
on o.order_item = i.inventory_item
Enter fullscreen mode Exit fullscreen mode

CROSS JOIN

A CROSS JOIN joins everything with everything. There is no need to provide a column to join on, and it can result in a very big data set, (and a really big image so you’ll have to use your imagination when reviewing the image below).

Proceed with caution.

cross

select
  o.order_item, 
  i.inventory_item
from 
  orders o
cross join 
  inventory i
Enter fullscreen mode Exit fullscreen mode

UNION

While a JOIN combines columns horizontally a UNION combines rows vertically. This is technically a set operator rather than a JOIN clause but as we are talking about combining datasets this is a good opportunity to introduce it.

Using a UNION combines the result of two queries into one column and removes duplicates.

If your query has multiple columns, they need to be in the same order to complete the UNION.

union

select *
from
  orders
union
select *
from
  inventory
Enter fullscreen mode Exit fullscreen mode

UNION ALL

The UNION ALL combines the results of two queries the same as a UNION but keeps the duplicates in the result.

unionall

select *
from
  orders
union all
select *
from
  inventory
Enter fullscreen mode Exit fullscreen mode

The best way to get your head around JOINs is to start using them. If you aren't working with a SQL database already, check out SQLZoo or Hacker Rank to experiment with JOINs.


Read more


This post first appeared on helenanderson.co.nz

Top comments (4)

Collapse
 
geraldew profile image
geraldew

To start with, I generally like the look of this explanation as an article.

I'm going to be picky and cruel in saying this: but I think it's a disservice to not show that even inner joins enact row multiplications when there are multiple instances of the join value.

It's fair enough to not do so in the very first example of an inner join. Alas, having the multiplication of rows then only be shown later when the cross join is taught leaves a dreadful legacy.

I have spent many hours re-teaching analysts - who have sometimes been writing SQL for years - that have not been aware that repeated values on either side of the inner join can cause row expansions in the result. As I am wont to say - if something can happen at all in data, then at some time it will happen.

I think the basic relational fact of row expansions needs to be taught as soon as the inner join has been taught. I find that helps avoid a lot of misunderstandings about left joins too. I'm not saying it's easy, even less so in static print/screen material - I usually have the luxury of doing live peer guidance with a whiteboard, so I don't have a magic solution to proffer. I would suggest going to the bother of doing a second example of the inner join but showing a value in there twice on one side and thrice on the other (because a 2-1 or a 2-2 leaves room for the learner to misunderstand what's happening).

Of course, there will be some environments that new data analysts may encounter where enough tables have primary keys and so row multiplying joins might be rare. But I'd have to say that over my years of experience that this is becoming less common not more. Indeed I've been seeing "big data" users getting advice to not bother with key constraints for performance reasons and newer platforms such as Hadoop having fewer available constraints anyway.

I certainly don't enjoy dealing with multi-megabyte SQL scripts in which there is an unknown spot of row multiplication to be debugged. The most subtly pernicious is the combination of the coders thinking that join values are unique but having done nothing to ensure that they are - and when the data reality is that the data is only very-nearly-always unique.

While it can seem that such problems are really about managing the uniqueness in the tables I do think that a naiveté about inner joins allows problems to spread like an infection in a data processing sequence. I think that such an innocence comes from not being shown inner join row multiplications in the tuition phase.

Collapse
 
aarone4 profile image
Aaron Reese

In 20 years of writing SQL I have never found a legitimate use for RIGHT JOIN in production code. RJ indicates you don't understand your data as it can always be rewritten as a LJ.

One issue that often catches new users is the filter predicate. If you apply it to the Join clause on a LJ you only drop the joined record,; if you add it to the WHERE clause you drop the entire row.

Collapse
 
helenanders26 profile image
Helen Anderson

Great advice! Thanks for sharing :)

Collapse
 
waylonwalker profile image
Waylon Walker

The colors make this perfect. So clear and understandable