In a database, tables can be related to each other through different types of relationships, such as One-to-One Relationships, One-to-Many Relationships, and Many-to-Many Relationships. Based on these relationships, SQL joins are used to extract meaningful information from the data in the tables. However, retrieving meaningful insights from data in the same table with a parent-child relationship can be challenging. In this case, a Self Join is used to establish the parent-child relationships between different rows in the same table and extract meaningful insights.
Tools used in this tutorial
DbVisualizer, top rated database management tool and SQL client
What Is a Self Join in SQL?
A Self Join is a type of a JOIN
query used to compare rows within the same table. Unlike other SQL JOIN
queries that join two or more tables, a self join joins a table to itself. To use a self join, a table must have a unique identifier column, a parent column, and a child column.
For example, a table can have a primary key column, all employees in a company column, and managers that each employee in the company reports to column. Since all managers in the company are also employees, the managers' column in the table holds the primary key that represents each manager as an employee.
In this case, a Self Join can be used to get the names of all employees and the names of managers they report to in the company. Then the results can be presented in a table with a primary key, employee_name, and manager_name columns.
Self Join Syntax and Table Aliases
A self join syntax often looks like so:
SELECT table1.column1, table2.column2
FROM table_name table1
JOIN table_name table2 ON table1.column = table2.column;
In the syntax above, table_name
is the name of the table that self join is used to join to itself. Table1
and table2
are table aliases used to represent rows being compared.
Column1
, and column2
are used to represent rows being compared to each other in the resulting table. The table1.column
and table2.column
are columns used to establish the relationship between two rows in a table.
We’ll begin by demonstrating how to use a self-join query on a PostgreSQL database instance.
Using a Self Join on Relationships In a Table
In the Employees
table, the manager_id
column represents the relationship between each employee and their manager. Based on the Employee-Manager relationship in the Employees
table, we can use a self join to get each employee's name and the name of their respective manager using the query below.
SELECT emp.employee_name AS employee, mng.employee_name AS manager
FROM Employees emp
JOIN Employees mng ON emp.manager_id = mng.employee_id
In the query above, emp
and mng
represent table aliases where emp
is for the employee while mng
is for the manager. Then the table is joined on the condition that the manager’s ID (mng.employee_id
) matches the employee’s manager ID (emp.manager_id
).
From the resulting table above, we can see the relationship or hierarchy between employees and their managers. Jane Smith reports to John Doe, while Isabel Archer reports to Jane Smith.
Recursive Self Join
From the Company Employees
table, we can use a recursive self join to determine the hierarchy of employees in the company from top to bottom based on their relationships with their managers using the query below.
WITH RECURSIVE EmployeeHierarchy AS (
SELECT employee_id, employee_name, manager_id, 0 AS level
FROM companyemployees
WHERE manager_id IS NULL
UNION ALL
SELECT emp.employee_id, emp.employee_name, emp.manager_id, h.level + 1
FROM companyemployees emp
JOIN EmployeeHierarchy eh
ON emp.manager_id = eh.employee_id)
SELECT employee_id, employee_name, level
FROM EmployeeHierarchy;
In the query above, a common table expression (CTE) called EmployeeHierarchy
is used together with the WITH RECURSIVE
syntax. Inside the CTE, the query has two parts.
The first part of the query selects the topmost manager and assigns them a level of 0, assuming the topmost manager has NULL
as their manager_id
. The second part of the query is defined using the UNION ALL
clause to loop through each employee to determine their hierarchy level based on previous levels of managers they report to.
From the results above, John Smith is the topmost manager at level 0. John Doe is at level 1, reporting to John Smith, while Jane Smith is at level 2, reporting to John Doe. Joseph Reid is the bottom-most manager at level 5, reporting to Nick Adams - no one reports to Joseph.
Applications of Self Joins
Self joins have various applications in database management and querying. Here are some common scenarios where self joins can be helpful:
- Managing hierarchical data: Self joins are often used to represent hierarchical relationships, such as organizational charts, product categories, or file systems. By joining a table to itself, you can retrieve parent-child relationships and navigate through the hierarchy.
- Employee-Manager relationships: In an employee database, self joins can be used to retrieve information about employees and their managers. By comparing the employee ID with the manager ID within the same table, you can identify the reporting structure and gather details about managers and their subordinates. -** Network analysis:** Self joins can be employed in network analysis scenarios. For example, if you have a table representing connections between users in a social network, a self join can help identify common connections, mutual friends, or indirect relationships between users.
- Comparing related records: Self joins also enable the comparison of related records within a table. For example, you can compare sales records of customers with the same postal code or find customers who have made multiple purchases over a specific period.
- Analyzing time-based data: Self joins can be used to analyze time-series data within a table. By joining the table with itself based on time intervals, you can compare records from different time periods, calculate changes or trends, and identify patterns.
Conclusion
In conclusion, a Self Join is used to establish the parent-child or hierarchy relationships between different rows in the same table and extract meaningful insights. A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships.
We hope you’ve found this blog useful - make sure to follow our blog for updates and news in the database space, until next time.
FAQs
What is a self join in SQL?
A self join is a type of join operation in a relational database where a table is joined with itself. It allows you to combine rows from the same table based on a related condition.
What is the difference between full, inner, and self join?
A full join returns all tables' rows, including matching and non-matching rows. An inner join returns only the matching rows between two tables based on a specified join condition. A self-join is a specific type of join where a table is joined with itself.
About the author
Bonnie is a web developer and technical writer creating easy-to-understand technical articles.
Top comments (0)