In SQL, JOIN is used to link two or more tables together in a single result set, and the WHERE clause is used to filter the results based on some criteria. That seems simple, but do we really understand how they affect our queries?
In order to take a deeper look at it, we'll create a simple database as an example. And the best example I can think of now is pizza! π
The database structure
Our Pizza Shop database has 2 main tables: customers and pizzas. The only other table is orders, which connects the other two.
The contents of the three tables follow below. If you'd like, you can create the same database on your end, and verify that you get the same results.
customers pizzas orders
============== ============================= ===================================
id | name id | flavor | price date | customer_id | pizza_id
-- | --------- -- | ---------------- | ----- ---------- | ----------- | --------
1 | Beatrice 1 | Americana | 7.10 2023-03-08 | 6 | 6
2 | Enrico 2 | Caprese | 7.00 2023-03-15 | 3 | 7
3 | Francesco 3 | Funghi | 7.80 2023-03-22 | 1 | 4
4 | Giovanni 4 | Margherita | 5.70 2023-04-12 | 2 | 4
5 | Giuseppe 5 | Napoletana | 5.80 2023-04-21 | 2 | 7
6 | Maria 6 | Nutella | 6.90 2023-05-08 | 7 | 1
7 | Nina 7 | Prosciutto | 7.90 2023-05-09 | 2 | 4
8 | Quattro Formaggi | 7.00 2023-05-15 | 3 | 4
9 | Tonno | 7.70 2023-06-03 | 1 | 9
10 | Verdure | 7.00 2023-06-10 | 2 | 8
2023-06-14 | 7 | 6
2023-06-22 | 7 | 8
2023-06-26 | 3 | 10
Filtering results with WHERE
Let's start with the simpler, and dare I say, one of the most common keywords in SQL: WHERE.
You could select all Customers with a simple query:
SELECT * FROM shop.customers
id | name
-- | ---------
1 | Beatrice
2 | Enrico
3 | Francesco
4 | Giovanni
5 | Giuseppe
6 | Maria
7 | Nina
But if you only want to see the customers starting with the letter G, you could add this condition and filter the results:
SELECT * FROM shop.customers WHERE name LIKE 'G%'
id | name
-- | --------
4 | Giovanni
5 | Giuseppe
You can have all kinds of conditions in a WHERE clause, for example comparing strings (like we did for the name), dates (WHERE birthday > '1989-01-01'
), boolean (WHERE is_premium IS FALSE
), number (WHERE price < 5.50
), and these are just a few examples.
The WHERE clause can be used not only with SELECT statements. Actually you must use it with DELETE statements - unless your intent is to wipe out all the data from a table, but even then, you're better off using TRUNCATE instead of DELETE. So, don't forget to use WHERE in your DELETE statements.
Linking tables with JOIN
There are several types of JOINs, but let's look at the most common ones: INNER JOIN, LEFT JOIN and RIGHT JOIN.
Let's represent our tables with circles. Circle A represents our customers, B represents our pizzas, and the intersection represents customers who actually bought pizzas, from our orders.
To view all values from A, we simply select from our customers table:
SELECT * FROM shop.customers
Similarly, we can retrieve B by selecting from pizzas:
SELECT * FROM shop.pizzas
Now to the JOINs.
INNER JOIN
If I want to see all customers who ordered pizzas, I could do so using an INNER JOIN. Notice: if I only use JOIN in my query, INNER JOIN is implied.
SELECT DISTINCT c.id, c."name"
FROM shop.customers c
JOIN shop.orders o ON c.id = o.customer_id
ORDER BY c."name"
Similarly I could view all pizzas that were ever ordered:
SELECT DISTINCT p.id, p.flavor, p.price
FROM shop.pizzas p
JOIN shop.orders o ON p.id = o.pizza_id
ORDER BY p.flavor
These 2 can be represented by the intersection between the circles.
LEFT JOIN
To view how many pizzas each customer ordered, I could COUNT the number of orders for each customer:
SELECT c.id, c."name", COUNT(o.customer_id) AS pizzas_ordered
FROM shop.customers c
JOIN shop.orders o ON c.id = o.customer_id
GROUP BY c.id, c."name"
ORDER BY c."name"
Running this query returns the following result:
id | name | pizzas_ordered
-- | --------- | --------------
1 | Beatrice | 2
2 | Enrico | 4
3 | Francesco | 3
6 | Maria | 1
7 | Nina | 3
As you can see, only 5 rows were returned, but we know we have 7 customers. What happened? Well, the other 2 customers have not ordered any pizzas yet, and since we're using INNER JOIN in our query, they are not returned.
If we want to see all customers, including those who haven't ordered yet, we just have to replace the INNER JOIN in our query with LEFT JOIN:
SELECT c.id, c."name", COUNT(o.customer_id) AS pizzas_ordered
FROM shop.customers c LEFT JOIN shop.orders o ON c.id = o.customer_id
GROUP BY c.id, c."name"
ORDER BY c."name"
LEFT JOIN returns all rows from the table on the LEFT, even if they don't match any records on the table from the right. I left the two tables on the same line of the query to better show which table is on the left of the JOIN and which is on the right.
We can represent the results with the A and B circles, resulting with the full set of A, including the intersection between A and B.
Now we get the results we were expecting:
id | name | pizzas_ordered
-- | --------- | --------------
1 | Beatrice | 2
2 | Enrico | 4
3 | Francesco | 3
4 | Giovanni | 0
5 | Giuseppe | 0
6 | Maria | 1
7 | Nina | 3
RIGHT JOIN
Works just like the LEFT JOIN, but brings all records from the table on the RIGHT, even if they don't match any records on the table from the LEFT. We could rewrite the same query to use RIGHT JOIN by just changing positions:
SELECT c.id, c."name", COUNT(o.customer_id) AS pizzas_ordered
FROM shop.orders o RIGHT JOIN shop.customers c ON o.customer_id = c.id
GROUP BY c.id, c."name"
ORDER BY c."name"
id | name | pizzas_ordered
-- | --------- | --------------
1 | Beatrice | 2
2 | Enrico | 4
3 | Francesco | 3
4 | Giovanni | 0
5 | Giuseppe | 0
6 | Maria | 1
7 | Nina | 3
Just like the LEFT JOIN, this can be represented with the A and B circles, this time with the full set of B, and the intersection between A and B.
Let's see another example. The following query will return how many pizzas of each flavor were sold, including the flavors that were not sold at all:
SELECT p.id, p.flavor, COUNT(o.pizza_id) AS pizzas_ordered
FROM shop.orders o
RIGHT JOIN shop.pizzas p ON o.pizza_id = p.id
GROUP BY p.id, p.flavor
ORDER BY p.flavor
id | flavor | pizzas_ordered
-- | ---------------- | --------------
1 | Americana | 1
2 | Caprese | 0
3 | Funghi | 0
4 | Margherita | 4
5 | Napoletana | 0
6 | Nutella | 2
7 | Prosciutto | 2
8 | Quattro Formaggi | 2
9 | Tonno | 1
10 | Verdure | 1
The result shows that Margherita was the most ordered flavor, while Caprese, Funghi and Napoletana were not ordered at all.
Combining JOIN and WHERE
This is where it could get tricky. If we're not attentive, we could end up writing redundant queries or even wrong queries according to the result we're expecting.
Let's say we're trying to retrieve the customers who bought pizzas, ordering by the amount of pizzas per customer.
SELECT c.id, c."name", COUNT(o.customer_id) AS pizzas_ordered
FROM shop.customers c
JOIN shop.orders o ON c.id = o.customer_id
GROUP BY c.id, c."name"
HAVING COUNT(o.customer_id) > 0
ORDER BY COUNT(o.customer_id)
id | name | pizzas_ordered
-- | --------- | --------------
6 | Maria | 1
1 | Beatrice | 2
3 | Francesco | 3
7 | Nina | 3
2 | Enrico | 4
In this query we are using the keyword HAVING instead of WHERE since we're dealing with an aggregate function. You can think of HAVING as the WHERE clause used for columns that use functions such as COUNT, SUM, etc.
The query above brings the correct results, but we don't actually need the HAVING there. Why not? Well, we're already using a JOIN (INNER JOIN), which only returns results when there is a match between the 2 tables, hence no customers without orders will ever be returned. That's why the HAVING clause is redundant, and can be removed.
SELECT c.id, c."name", COUNT(o.customer_id) AS pizzas_ordered
FROM shop.customers c
JOIN shop.orders o ON c.id = o.customer_id
GROUP BY c.id, c."name"
ORDER BY COUNT(o.customer_id)
This query is shorter and returns the exact same result.
id | name | pizzas_ordered
-- | --------- | --------------
6 | Maria | 1
1 | Beatrice | 2
3 | Francesco | 3
7 | Nina | 3
2 | Enrico | 4
Now let's say we're creating a report and we want to see how much money we made from the orders of each flavor of pizza during the month of May.
SELECT p.id, p.flavor, COUNT(o.pizza_id) * p.price AS total
FROM shop.pizzas p
JOIN shop.orders o ON p.id = o.pizza_id
WHERE o."date" BETWEEN '2023-05-01' AND '2023-05-31'
GROUP BY p.id, p.flavor
ORDER BY p.flavor
id | flavor | total
-- | ---------- | -----
1 | Americana | 7.10
4 | Margherita | 11.40
We can see from the result that during May we made 7.10 by selling Americana pizzas and 11.40 from Margherita pizzas. But how about the other flavors?
Well, we're using INNER JOIN in our query, so we're not bringing all flavors of pizza. Let's replace that with LEFT JOIN.
SELECT p.id, p.flavor, COUNT(o.pizza_id) * p.price AS total
FROM shop.pizzas p
LEFT JOIN shop.orders o ON p.id = o.pizza_id
WHERE o."date" BETWEEN '2023-05-01' AND '2023-05-31'
GROUP BY p.id, p.flavor
ORDER BY p.flavor
id | flavor | total
-- | ---------- | -----
1 | Americana | 7.10
4 | Margherita | 11.40
We changed our JOIN, but still got the same result. Why was that? This is the reason for writing this article, understanding how our JOINs and WHERE clauses affect our result.
In our query we used LEFT JOIN, so we should be getting all pizzas, including those that were not sold. Considering circle A as our customers table, B as pizzas and the intersection as the orders. We should be getting something like this:
And we are indeed getting it, we could verify that by removing the WHERE clause:
SELECT p.id, p.flavor, COUNT(o.pizza_id) * p.price AS total
FROM shop.pizzas p
LEFT JOIN shop.orders o ON p.id = o.pizza_id
GROUP BY p.id, p.flavor
ORDER BY p.flavor
id | flavor | total
-- | ---------------- | -----
1 | Americana | 7.10
2 | Caprese | 0.00
3 | Funghi | 0.00
4 | Margherita | 22.80
5 | Napoletana | 0.00
6 | Nutella | 13.80
7 | Prosciutto | 15.80
8 | Quattro Formaggi | 14.00
9 | Tonno | 7.70
10 | Verdure | 7.00
We saw that we're retrieving all the pizzas, but our WHERE clause is filtering out the results, and this is exactly the purpose of the WHERE clause.
Keep this in mind: in order to bring the correct results, we need to understand how to link our tables with the correct JOIN (if we should use LEFT, RIGHT, INNER, or other types of JOINs), and after that, how we're filtering the results with our WHERE clause.
Solution to this problem
Whenever we face a situation in which we believe the query is not returning the expected result, my tip is to remove the WHERE clauses, and review the JOINs. Check to see if you're linking the tables correctly. After that, think of how you could be limiting your results to your needs.
One way to solve the pizza problem is to not use WHERE at all. Instead we can add the condition for the date in our JOIN:
SELECT p.id, p.flavor, COUNT(o.pizza_id) * p.price AS total
FROM shop.pizzas p
LEFT JOIN shop.orders o ON p.id = o.pizza_id
AND o."date" BETWEEN '2023-05-01' AND '2023-05-31'
GROUP BY p.id, p.flavor
ORDER BY p.flavor
id | flavor | total
-- | ---------------- | -----
1 | Americana | 7.10
2 | Caprese | 0.00
3 | Funghi | 0.00
4 | Margherita | 11.40
5 | Napoletana | 0.00
6 | Nutella | 0.00
7 | Prosciutto | 0.00
8 | Quattro Formaggi | 0.00
9 | Tonno | 0.00
10 | Verdure | 0.00
With this new query we're limiting our results to the records from May, but we're still returning all pizzas even for those flavors without orders, which is the expected result.
Review your queries: verify you are linking your tables with the correct JOINs and filtering the results with appropriate WHERE clauses. Keep that in mind, and you'll spend less time fixing your queries, and more time appreciating good pizza! π
Top comments (1)
Thanks for making it a bit clear Douglas!