Samandar Komilov

Posted on Jul 13

Understanding SQL Joins and Grouping using Set Theory

#sql #database

🏁 In the world of databases, SQL joins are fundamental operations that allow us to combine data from multiple tables based on related columns. While SQL joins might seem daunting at first, they are conceptually rooted in Set Theory — a branch of mathematics that deals with collections of objects and operations on them.

This post aims to demystify SQL joins by exploring their relationship with set theory concepts:

Understanding Tables as Sets: In SQL, each table can be seen as a set of rows (records). Just as in set theory, these rows are unique unless explicitly defined otherwise.
Types of Joins: SQL offers several types of joins—such as inner joins, left joins, right joins, and outer joins—each serving a specific purpose in combining data from multiple tables.
Set Operations Analogies: We'll delve into how SQL joins correspond to set operations. For instance, an inner join can be likened to the intersection of two sets, where only the common elements (rows) are retained.
Visualizing Joins: Using practical examples and diagrams, we'll visualize how data from different tables intersects or combines based on join conditions.

By the end of this post, you'll have a clearer understanding of SQL joins not just as technical database operations, but as logical extensions of set theory principles.

Understanding Tables as Sets

Each table consists of some number of rows and columns, we know. In each row, there are a number of elements equal to the number of columns.

For instance, in this table, there are 4 rows with 3 elements each (3 columns). Each row is considered as a tuple in set theory and the table is a set of tuples of elements:
K = {(1, Polat, 45), (2, Memati, 40), (3, Abdulhey, 39), (4, Cahit, 35)}

So, I hope we got the point. Now, let's move onto the JOINS.

Types of Joins

There are various sources that tell distinct information about the exact number of Joins. We'll consider the following Join types:

INNER JOIN
LEFT (OUTER) JOIN
RIGHT (OUTER) JOIN
FULL (OUTER) JOIN
CROSS JOIN (Cartesian Product)
ANTI LEFT JOIN
ANTI RIGHT JOIN
ANTI OUTER JOIN

INNER JOIN

An inner join returns only the rows (tuples) that have matching values in both tables (sets).

In set theory terms, it performs an intersection of two sets. Let's say, we are inner joining A and B tables, then:

Inner Join = A ∩ B

It would be clearer if we consider elementwise, yeah? So, say we have these sets:

A = {(1, 'Pusat'), (2, 'Zaza'), (3, 'Murad')}
B = {(1, 'HR'), (2, 'IT'), (4, 'Sales')}

Then the inner join on the 1st element (employeeID) of these 2 sets will be:

A ∩ B = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT')}

But 'Murad' and 'Sales' have different IDs, so they are not present in inner join result.

LEFT (OUTER) JOIN

A left outer join returns all the rows from the left table (set), along with the matching rows from the right table (set). If there is no match, the result is NULL from the right table.

In set theory, it can be thought of as including all elements from the left set and the intersection with the right set:

Left Outer Join = A ∪ (A ∩ B)

In terms of the above example sets, the resulting left outer join set will be like this:

A ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL)}

Because, the result is all the matching rows (tuples) plus the other tuples on the left set. Due to the absence of Department for Murad, NULL will be saved.

RIGHT (OUTER) JOIN

A right outer join returns all the rows from the right table (set), along with the matching rows from the left table (set). If there is no match, the result is NULL from the left table. This is similar to the left join but includes all elements from the right set.

In set theory, it can be thought of as including all elements from the right set and the intersection with the left set:

Right Outer Join = B ∪ (A ∩ B)

In terms of the above example sets, the resulting right outer join set will be like this:

B ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (4, NULL, 'Sales')}

Because, the result is all the matching rows (tuples) plus the other tuples on the right set. Due to the absence of Name for Sales, NULL will be saved.

FULL (OUTER) JOIN

A full outer join returns all the rows when there is a match in either table (set). If there is no match, the result is NULL on the side that does not have a match.

In set theory, it is the union of both sets, including their intersection:

Full Outer Join = (A ∪ B) ∪ (A ∩ B)

In terms of the above example sets, the resulting right outer join set will be like this:

B ∪ (A ∩ B) = {(1, 'Pusat', 'HR'), (2, 'Zaza', 'IT'), (3, 'Murad', NULL), (4, NULL, 'Sales')}

This is like the sum of 2 above cases, but the matching rows participate only once in the result set.

CROSS JOIN (Cartesian Product)

A cross join produces the Cartesian product of two tables, meaning it combines each row from the first table with every row from the second table.

In set theory, there is a cartesian product operation: A × B.
In terms of the above example sets, the result becomes:

A × B = {(1, 'Pusat', 1, 'HR'), (1, 'Pusat', 2, 'IT'), ...}

The list is too long (3×3 elements), hence I left only a few of them, but you got the point.

ANTI LEFT JOIN

Returns rows from the left table (A) that do not have matching rows in the right table (B).

In set theory, this operation is similar to finding elements in set A that are not in set B, denoted as A ∖ B. The result according to the example sets above:

A ∖ B = {(3, 'Murad')}

ANTI RIGHT JOIN

Returns rows from the right table (B) that do not have matching rows in the left table (A).

In set theory, this operation is similar to finding elements in set B that are not in set A, denoted as B ∖ A. The result according to the example sets above:

B \ A = {(4, 'Sales')}

ANTI FULL JOIN

Returns rows that do not have matching rows in either table (A or B).

In set theory, this operation corresponds to the symmetric difference of sets A and B, denoted as A Δ B. The result according to the example sets above:

A Δ B = {(3, 'Murad', NULL), (4, NULL, 'Sales')}

Enough with Joins, now let's start considering Group By in terms of set theory and with examples.

GROUP BY

Sooon...

🏁 As everything in databases are constructed using mathematical concepts under the hood, it is so essential to understand the database operations using set theory operations and imagine how it is working. To study Set Theory, which is a branch of Discrete Mathematics, I recommend taking this MIT course:

https://openlearninglibrary.mit.edu/courses/course-v1:OCW+6.042J+2T2019/course/

Today, I tried to help giving the idea as I can, thanks for your attention!

DEV Community

Understanding SQL Joins and Grouping using Set Theory

Understanding Tables as Sets

Types of Joins

INNER JOIN

LEFT (OUTER) JOIN

RIGHT (OUTER) JOIN

FULL (OUTER) JOIN

CROSS JOIN (Cartesian Product)

ANTI LEFT JOIN

ANTI RIGHT JOIN

ANTI FULL JOIN

GROUP BY

Top comments (0)

Read next

Ollama chatbot with vercel ai with pgai, pgvector.

Clear screen at MariaDB CLI

DearBook: Create Magical Illustrated Children's Stories with AI

Understanding Batch SQL Injection: A Real-World Threat to Data Security