“They’re the same table.”
Relational databases make it very easy to join data from different tables, but did you know you can also JOIN data within the same table?
Relational databases make it easy to JOIN (combine) data from different tables. However, you can also JOIN a table with itself. This is especially handy when a table already contains all the data you need, but not in the right format.
References to the same table
One of the most common reasons to JOIN a table with itself, is because you have a table that contains a foreign key reference to itself.
In the example below we have an employees
table with employees. The table includes a manager_id
column that refers to an employee in that same employees
table.
employees | ||
---|---|---|
id | name | manager_id |
1 | Denholm | |
2 | Richmond | |
3 | Roy | 5 |
4 | Maurice | 5 |
5 | Jen | 1 |
If we want to create an overview that lists the name
of each employee along with the name
of their manager, we need to join the employees
table with itself.
When we do this, we need to assign an alias | Think of these as variable names for your tables! to one (or both) of them so that the database can distinguish between the two instances. It doesn’t really matter what names you use, as long as they’re unique and make sense to you.
Since we have two tables with identical column names, the database no longer understands what you mean when you say you want something like a name
. You therefore have to explicitly tell it that you want the name
of some row in the employees
or the managers
version of the table. This can be done by prepending column names with a table alias, followed by a .
:
SELECT
employees.id,
employees.name,
managers.name AS manager -- Note: the “AS” here is optional
FROM
employees
LEFT JOIN -- Use an INNER JOIN to exclude employees without a manager
employees AS managers -- Note: this “AS” is optional as well!
ON employees.manager_id = managers.id;
This query gives us the following result:
+----+----------+---------+
| id | name | manager |
+----+----------+---------+
| 1 | Denholm | NULL |
| 2 | Richmond | NULL |
| 3 | Roy | Jen |
| 4 | Maurice | Jen |
| 5 | Jen | Denholm |
+----+----------+---------+
Computing durations between changes
Another common reason to use self joins is to calculate some duration or distance between pairs of rows within a table.
For example, the order_state
table below keeps track of state changes that happen to orders at a webshop. Each record in this table includes a reference to the order, the name of the new state, and shows when the order changed to its new state. Can we calculate how long each of these state transitions took?
order_state | |||
---|---|---|---|
id | order_id | state | created_at |
1 | 1 | placed | 2021-06-01 00:01:00 |
2 | 1 | packaged | 2021-06-01 00:58:00 |
3 | 1 | despatched | 2021-06-01 02:20:00 |
4 | 1 | delivered | 2021-06-01 15:20:00 |
5 | 2 | placed | 2021-06-02 14:10:00 |
6 | 2 | packaged | 2021-06-02 18:55:00 |
7 | 3 | placed | 2021-06-02 19:00:00 |
8 | 3 | packaged | 2021-06-02 20:30:00 |
9 | 3 | despatched | 2021-06-03 01:40:00 |
10 | 2 | despatched | 2021-06-03 01:40:00 |
11 | 2 | delivered | 2021-06-03 08:00:00 |
12 | 3 | delivered | 2021-06-03 16:10:00 |
Of course we can! The query below shows how.
Once again, we have one table, order_state
, that we want to JOIN with itself. We name one of these from_state
and the other to_state
, to represent the collection of states before and after the transition.
Obviously, the two states involved in the transition should belong to the same order, so we add the condition that from_state.order_id
must be equal to to_state.order_id
. Moreover, the to_state
should occur later than the from_state
, so we’ll also add a condition that the to_state.id
must be
higher than from_state
(*).
(*) In real applications might also or only want to look at the timestamp. Be careful if multiple transitions can happen at the same time, e.g. within the same second!
Finally, we want to make sure that we only see direct state transitions, like those from placed
to packages
, and not indirect ones, like those from placed
to delivered
. This is done by first grouping by from_state.id
, which ensures that the “from” state appears only once in the result, and then JOINING each from_state
with the closest to_state
record that exists, i.e. the record with the lowest to_state.id
.
SELECT
from_state.order_id,
from_state.state,
to_state.state,
to_state.created_at AS changed_at,
TIMEDIFF(to_state.created_at, from_state.created_at) AS duration
FROM
order_state from_state
INNER JOIN
order_state to_state
ON 1 -- This “1” here makes it possible to nicely align our JOIN conditions
AND from_state.order_id = to_state.order_id
AND from_state.id < to_state.id
GROUP BY
from_state.id
HAVING
MIN(to_state.id)
ORDER BY
order_id,
from_state.id;
+----------+------------+------------+---------------------+----------+
| order_id | state | state | changed_at | duration |
+----------+------------+------------+---------------------+----------+
| 1 | placed | packaged | 2021-06-01 00:58:00 | 00:57:00 |
| 1 | packaged | despatched | 2021-06-01 02:20:00 | 01:22:00 |
| 1 | despatched | delivered | 2021-06-01 15:20:00 | 13:00:00 |
| 2 | placed | packaged | 2021-06-02 18:55:00 | 04:45:00 |
| 2 | packaged | despatched | 2021-06-03 01:40:00 | 06:45:00 |
| 2 | despatched | delivered | 2021-06-03 08:00:00 | 06:20:00 |
| 3 | placed | packaged | 2021-06-02 20:30:00 | 01:30:00 |
| 3 | packaged | despatched | 2021-06-03 01:40:00 | 05:10:00 |
| 3 | despatched | delivered | 2021-06-03 16:10:00 | 14:30:00 |
+----------+------------+------------+---------------------+----------+
Creating pairs of data
Self joins can also be used to easily create large or unique pairs of data from a relatively small set of data.
Cartesian product
You can join a table with itself without any JOIN conditions to create a result set that includes every possible combination of values within a single table, for instance when you want to know how many possible configurations exist for a product or when you wish to discover which combinations are common or rare.
In the example below, we combine multiple instances of a digit
table to form larger numbers.
digit |
---|
id |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
SELECT
CONCAT(B.id, A.id) AS number
FROM
digit A
JOIN
digit B;
+--------+
| number |
+--------+
| 00 |
| 01 |
| 02 |
| 03 |
| 04 |
| 05 |
| 06 |
| 07 |
| 08 |
| 09 |
| 10 |
| 11 |
| 12 |
| 13 |
| 14 |
| 15 |
| 16 |
| 17 |
| 18 |
| 19 |
| 20 |
| 21 |
| 22 |
| 23 |
| 24 |
| 25 |
| 26 |
| 27 |
| 28 |
| 29 |
| 30 |
| 31 |
| 32 |
| 33 |
| 34 |
| 35 |
| 36 |
| 37 |
| 38 |
| 39 |
| 40 |
| 41 |
| 42 |
| 43 |
| 44 |
| 45 |
| 46 |
| 47 |
| 48 |
| 49 |
| 50 |
| 51 |
| 52 |
| 53 |
| 54 |
| 55 |
| 56 |
| 57 |
| 58 |
| 59 |
| 60 |
| 61 |
| 62 |
| 63 |
| 64 |
| 65 |
| 66 |
| 67 |
| 68 |
| 69 |
| 70 |
| 71 |
| 72 |
| 73 |
| 74 |
| 75 |
| 76 |
| 77 |
| 78 |
| 79 |
| 80 |
| 81 |
| 82 |
| 83 |
| 84 |
| 85 |
| 86 |
| 87 |
| 88 |
| 89 |
| 90 |
| 91 |
| 92 |
| 93 |
| 94 |
| 95 |
| 96 |
| 97 |
| 98 |
| 99 |
+--------+
I once used a very similar method to generate lists of dates in a legacy business intelligence application that was built on top of a MySQL database:
SELECT
DATE_ADD(
CONCAT(
YEAR(CURDATE()), -- Get the current year
'-01-01' -- Append “-01-01” to it so that we get “2021-01-01”
),
INTERVAL CONCAT(B.id, A.id) DAY -- Show 2021-01-01 + x days
) AS date
FROM
digit A
JOIN
digit B;
+------------+
| date |
+------------+
| 2021-01-01 |
| 2021-01-02 |
| 2021-01-03 |
| 2021-01-04 |
| 2021-01-05 |
| 2021-01-06 |
| 2021-01-07 |
| 2021-01-08 |
| 2021-01-09 |
| 2021-01-10 |
| 2021-01-11 |
| 2021-01-12 |
| 2021-01-13 |
| 2021-01-14 |
| 2021-01-15 |
| 2021-01-16 |
| 2021-01-17 |
| 2021-01-18 |
| 2021-01-19 |
| 2021-01-20 |
| 2021-01-21 |
| 2021-01-22 |
| 2021-01-23 |
| 2021-01-24 |
| 2021-01-25 |
| 2021-01-26 |
| 2021-01-27 |
| 2021-01-28 |
| 2021-01-29 |
| 2021-01-30 |
| 2021-01-31 |
| 2021-02-01 |
| 2021-02-02 |
| 2021-02-03 |
| 2021-02-04 |
| 2021-02-05 |
| 2021-02-06 |
| 2021-02-07 |
| 2021-02-08 |
| 2021-02-09 |
| 2021-02-10 |
| 2021-02-11 |
| 2021-02-12 |
| 2021-02-13 |
| 2021-02-14 |
| 2021-02-15 |
| 2021-02-16 |
| 2021-02-17 |
| 2021-02-18 |
| 2021-02-19 |
| 2021-02-20 |
| 2021-02-21 |
| 2021-02-22 |
| 2021-02-23 |
| 2021-02-24 |
| 2021-02-25 |
| 2021-02-26 |
| 2021-02-27 |
| 2021-02-28 |
| 2021-03-01 |
| 2021-03-02 |
| 2021-03-03 |
| 2021-03-04 |
| 2021-03-05 |
| 2021-03-06 |
| 2021-03-07 |
| 2021-03-08 |
| 2021-03-09 |
| 2021-03-10 |
| 2021-03-11 |
| 2021-03-12 |
| 2021-03-13 |
| 2021-03-14 |
| 2021-03-15 |
| 2021-03-16 |
| 2021-03-17 |
| 2021-03-18 |
| 2021-03-19 |
| 2021-03-20 |
| 2021-03-21 |
| 2021-03-22 |
| 2021-03-23 |
| 2021-03-24 |
| 2021-03-25 |
| 2021-03-26 |
| 2021-03-27 |
| 2021-03-28 |
| 2021-03-29 |
| 2021-03-30 |
| 2021-03-31 |
| 2021-04-01 |
| 2021-04-02 |
| 2021-04-03 |
| 2021-04-04 |
| 2021-04-05 |
| 2021-04-06 |
| 2021-04-07 |
| 2021-04-08 |
| 2021-04-09 |
| 2021-04-10 |
+------------+
Unique pairs
Sometimes you’re only interested in unique combinations. For example when you need to create pairs of people or want to list edges in an undirected graph (**): it doesn’t matter which value is listed first or second – what matters is that the first and second values are listed together exactly once.
(**) Yes, you can store graphs in a relational database!
The example below shows a simple person
table that contains the names of six people.
person | |
---|---|
id | name |
1 | Rachel |
2 | Monica |
3 | Phoebe |
4 | Joey |
5 | Chandler |
6 | Ross |
We can create a list of possible pairings by adding a simple condition to the JOIN clause:
SELECT
A.name, B.name
FROM
person A
JOIN
person B ON A.id < B.id;
+----------+----------+
| name | name |
+----------+----------+
| Rachel | Monica |
| Rachel | Phoebe |
| Rachel | Joey |
| Rachel | Chandler |
| Rachel | Ross |
| Monica | Phoebe |
| Monica | Joey |
| Monica | Chandler |
| Monica | Ross |
| Phoebe | Joey |
| Phoebe | Chandler |
| Phoebe | Ross |
| Joey | Chandler |
| Joey | Ross |
| Chandler | Ross |
+----------+----------+
Ranking results
Self-JOINs can also be used to compute ranks of query results, although nowadays there are much better ways to do this.
Top comments (0)