π In the world of databases, SQL joins are fundamental operations that allow us to combine data from multiple tables based on related columns. While SQL joins might seem daunting at first, they are conceptually rooted in Set Theory β a branch of mathematics that deals with collections of objects and operations on them.
This post aims to demystify SQL joins by exploring their relationship with set theory concepts:
Understanding Tables as Sets: In SQL, each table can be seen as a set of rows (records). Just as in set theory, these rows are unique unless explicitly defined otherwise.
Types of Joins: SQL offers several types of joinsβsuch as inner joins, left joins, right joins, and outer joinsβeach serving a specific purpose in combining data from multiple tables.
Set Operations Analogies: We'll delve into how SQL joins correspond to set operations. For instance, an inner join can be likened to the intersection of two sets, where only the common elements (rows) are retained.
Visualizing Joins: Using practical examples and diagrams, we'll visualize how data from different tables intersects or combines based on join conditions.
By the end of this post, you'll have a clearer understanding of SQL joins not just as technical database operations, but as logical extensions of set theory principles.
Understanding Tables as Sets
Each table consists of some number of rows and columns, we know. In each row, there are a number of elements equal to the number of columns.
For instance, in this table, there are 4 rows with 3 elements each (3 columns). Each row is considered as a tuple in set theory and the table is a set of tuples of elements:
K = {(1, Polat, 45), (2, Memati, 40), (3, Abdulhey, 39), (4, Cahit, 35)}
So, I hope we got the point. Now, let's move onto the JOINS.
Types of Joins
There are various sources that tell distinct information about the exact number of Joins. We'll consider the following Join types:
- INNER JOIN
- LEFT (OUTER) JOIN
- RIGHT (OUTER) JOIN
- FULL (OUTER) JOIN
- CROSS JOIN (Cartesian Product)
- ANTI LEFT JOIN
- ANTI RIGHT JOIN
- ANTI OUTER JOIN
INNER JOIN
An inner join returns only the rows (tuples) that have matching values in both tables (sets).
In set theory terms, it performs an intersection of two sets. Let's say, we are inner joining A and B tables, then:
Inner Join = A β© B
It would be clearer if we consider elementwise, yeah? So, say we have these sets:
A = {(1, 'Pusat'), (2, 'Zaza'), (3, 'Murad')}
B = {(1, 'HR'), (2, 'IT'), (4, 'Sales')}
Then the inner join on the 1st element (employeeID) of these 2 sets will be:
A β© B = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT')}
But 'Murad'
and 'Sales'
have different IDs, so they are not present in inner join result.
LEFT (OUTER) JOIN
A left outer join returns all the rows from the left table (set), along with the matching rows from the right table (set). If there is no match, the result is NULL from the right table.
In set theory, it can be thought of as including all elements from the left set and the intersection with the right set:
Left Outer Join = A βͺ (A β© B)
In terms of the above example sets, the resulting left outer join set will be like this:
A βͺ (A β© B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL)}
Because, the result is all the matching rows (tuples) plus the other tuples on the left set. Due to the absence of Department for Murad
, NULL
will be saved.
RIGHT (OUTER) JOIN
A right outer join returns all the rows from the right table (set), along with the matching rows from the left table (set). If there is no match, the result is NULL from the left table. This is similar to the left join but includes all elements from the right set.
In set theory, it can be thought of as including all elements from the right set and the intersection with the left set:
Right Outer Join = B βͺ (A β© B)
In terms of the above example sets, the resulting right outer join set will be like this:
B βͺ (A β© B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (4, NULL, 'Sales')}
Because, the result is all the matching rows (tuples) plus the other tuples on the right set. Due to the absence of Name for Sales
, NULL
will be saved.
FULL (OUTER) JOIN
A full outer join returns all the rows when there is a match in either table (set). If there is no match, the result is NULL on the side that does not have a match.
In set theory, it is the union of both sets, including their intersection:
Full Outer Join = (A βͺ B) βͺ (A β© B)
In terms of the above example sets, the resulting right outer join set will be like this:
B βͺ (A β© B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL), (4, NULL, 'Sales')}
This is like the sum of 2 above cases, but the matching rows participate only once in the result set.
CROSS JOIN (Cartesian Product)
A cross join produces the Cartesian product of two tables, meaning it combines each row from the first table with every row from the second table.
In set theory, there is a cartesian product operation: A Γ B
.
In terms of the above example sets, the result becomes:
A Γ B = {(1, 'Pusat', 1, 'HR'), (1, 'Pusat', 2, 'IT'), ...}
The list is too long (3Γ3 elements), hence I left only a few of them, but you got the point.
ANTI LEFT JOIN
Returns rows from the left table (A) that do not have matching rows in the right table (B).
In set theory, this operation is similar to finding elements in set A that are not in set B, denoted as A β B
. The result according to the example sets above:
A β B = {(3, 'Murad')}
ANTI RIGHT JOIN
Returns rows from the right table (B) that do not have matching rows in the left table (A).
In set theory, this operation is similar to finding elements in set B that are not in set A, denoted as B β A
. The result according to the example sets above:
B \ A = {(4, 'Sales')}
ANTI FULL JOIN
Returns rows that do not have matching rows in either table (A or B).
In set theory, this operation corresponds to the symmetric difference of sets A and B, denoted as A Ξ B
. The result according to the example sets above:
A Ξ B = {(3, 'Murad', NULL), (4, NULL, 'Sales')}
Enough with Joins, now let's start considering Group By in terms of set theory and with examples.
GROUP BY
Sooon...
π As everything in databases are constructed using mathematical concepts under the hood, it is so essential to understand the database operations using set theory operations and imagine how it is working. To study Set Theory, which is a branch of Discrete Mathematics, I recommend taking this MIT course:
https://openlearninglibrary.mit.edu/courses/course-v1:OCW+6.042J+2T2019/course/
Today, I tried to help giving the idea as I can, thanks for your attention!
Top comments (0)