DEV Community

Cover image for Understanding Self Joins in SQL
DbVisualizer
DbVisualizer

Posted on • Originally published at dbvis.com

Understanding Self Joins in SQL

In a database, tables can be related to each other through different types of relationships, such as One-to-One Relationships, One-to-Many Relationships, and Many-to-Many Relationships. Based on these relationships, SQL joins are used to extract meaningful information from the data in the tables. However, retrieving meaningful insights from data in the same table with a parent-child relationship can be challenging. In this case, a Self Join is used to establish the parent-child relationships between different rows in the same table and extract meaningful insights.


Tools used in this tutorial

DbVisualizer, top rated database management tool and SQL client


What Is a Self Join in SQL?

A Self Join is a type of a JOIN query used to compare rows within the same table. Unlike other SQL JOIN queries that join two or more tables, a self join joins a table to itself. To use a self join, a table must have a unique identifier column, a parent column, and a child column.

For example, a table can have a primary key column, all employees in a company column, and managers that each employee in the company reports to column. Since all managers in the company are also employees, the managers' column in the table holds the primary key that represents each manager as an employee.

In this case, a Self Join can be used to get the names of all employees and the names of managers they report to in the company. Then the results can be presented in a table with a primary key, employee_name, and manager_name columns.

Self Join Syntax and Table Aliases

A self join syntax often looks like so:

SELECT table1.column1, table2.column2
FROM table_name table1
JOIN table_name table2 ON table1.column = table2.column;
Enter fullscreen mode Exit fullscreen mode

In the syntax above, table_name is the name of the table that self join is used to join to itself. Table1 and table2 are table aliases used to represent rows being compared.

Column1, and column2 are used to represent rows being compared to each other in the resulting table. The table1.column and table2.column are columns used to establish the relationship between two rows in a table.

We’ll begin by demonstrating how to use a self-join query on a PostgreSQL database instance.

Using a Self Join on Relationships In a Table

To illustrate how a self join can be used on relationships in a table, consider a table called Employees.


Employees Table in DbVisualizer.

Employees Table in DbVisualizer.

In the Employees table, the manager_id column represents the relationship between each employee and their manager. Based on the Employee-Manager relationship in the Employees table, we can use a self join to get each employee's name and the name of their respective manager using the query below.

SELECT emp.employee_name AS employee, mng.employee_name AS manager
FROM Employees emp
JOIN Employees mng ON emp.manager_id = mng.employee_id
Enter fullscreen mode Exit fullscreen mode

In the query above, emp and mng represent table aliases where emp is for the employee while mng is for the manager. Then the table is joined on the condition that the manager’s ID (mng.employee_id) matches the employee’s manager ID (emp.manager_id).

When you run the query, it will result in a table that includes an employee's name and their respective manager's name.


The Result of a Self-Join Query in DbVisualizer.

The Result of a Self-Join Query in DbVisualizer.

From the resulting table above, we can see the relationship or hierarchy between employees and their managers. Jane Smith reports to John Doe, while Isabel Archer reports to Jane Smith.

Recursive Self Join

A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships. To illustrate how a recursive self join can be used on nested relationships in a table, consider a table called companyemployees.


The Company Employees Table in DbVisualizer.

The Company Employees Table in DbVisualizer.

From the Company Employees table, we can use a recursive self join to determine the hierarchy of employees in the company from top to bottom based on their relationships with their managers using the query below.

WITH RECURSIVE EmployeeHierarchy AS (
SELECT employee_id, employee_name, manager_id, 0 AS level
FROM companyemployees
WHERE manager_id IS NULL
UNION ALL
SELECT emp.employee_id, emp.employee_name, emp.manager_id, h.level + 1
FROM companyemployees emp
JOIN EmployeeHierarchy eh 
ON emp.manager_id = eh.employee_id)
SELECT employee_id, employee_name, level
FROM EmployeeHierarchy;
Enter fullscreen mode Exit fullscreen mode

In the query above, a common table expression (CTE) called EmployeeHierarchy is used together with the WITH RECURSIVE syntax. Inside the CTE, the query has two parts.

The first part of the query selects the topmost manager and assigns them a level of 0, assuming the topmost manager has NULL as their manager_id. The second part of the query is defined using the UNION ALL clause to loop through each employee to determine their hierarchy level based on previous levels of managers they report to.

When you run the query, it will result to a table that includes an employee’s name and their hierarchy level in the company management. The 0 represents the topmost manager, while the 5 represents the bottom-most manager.


Recursive Self Join Query Results in DbVisualizer.

Recursive Self Join Query Results in DbVisualizer.

From the results above, John Smith is the topmost manager at level 0. John Doe is at level 1, reporting to John Smith, while Jane Smith is at level 2, reporting to John Doe. Joseph Reid is the bottom-most manager at level 5, reporting to Nick Adams - no one reports to Joseph.

Applications of Self Joins

Self joins have various applications in database management and querying. Here are some common scenarios where self joins can be helpful:

  • Managing hierarchical data: Self joins are often used to represent hierarchical relationships, such as organizational charts, product categories, or file systems. By joining a table to itself, you can retrieve parent-child relationships and navigate through the hierarchy.
  • Employee-Manager relationships: In an employee database, self joins can be used to retrieve information about employees and their managers. By comparing the employee ID with the manager ID within the same table, you can identify the reporting structure and gather details about managers and their subordinates. -** Network analysis:** Self joins can be employed in network analysis scenarios. For example, if you have a table representing connections between users in a social network, a self join can help identify common connections, mutual friends, or indirect relationships between users.
  • Comparing related records: Self joins also enable the comparison of related records within a table. For example, you can compare sales records of customers with the same postal code or find customers who have made multiple purchases over a specific period.
  • Analyzing time-based data: Self joins can be used to analyze time-series data within a table. By joining the table with itself based on time intervals, you can compare records from different time periods, calculate changes or trends, and identify patterns.

Conclusion

In conclusion, a Self Join is used to establish the parent-child or hierarchy relationships between different rows in the same table and extract meaningful insights. A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships.

We hope you’ve found this blog useful - make sure to follow our blog for updates and news in the database space, until next time.

FAQs

What is a self join in SQL?

A self join is a type of join operation in a relational database where a table is joined with itself. It allows you to combine rows from the same table based on a related condition.

What is the difference between full, inner, and self join?

A full join returns all tables' rows, including matching and non-matching rows. An inner join returns only the matching rows between two tables based on a specified join condition. A self-join is a specific type of join where a table is joined with itself.

About the author

Bonnie is a web developer and technical writer creating easy-to-understand technical articles.

Top comments (0)