What are Common Table Expressions?
Common Table Expressions (CTEs) are a valuable feature in SQL that lets you create temporary result sets within a query. They simplify complex queries, enhance code readability, and improve query performance. CTEs are initiated using WITH
keyword.
Fig: CTE Syntax. Image from MariaDB
When to use CTEs?
CTEs are particularly useful to:
- Break down complex operations into simpler steps
- Handle hierarchical data structures
- Implement pagination for large result sets
- Streamline complex aggregation tasks
- Have reusable code if you need the same logic at multiple places
- Improve code readability and maintainability if your query involves subqueries, multiple joins, or intricate filtering conditions
Types of CTEs
Broadly CTEs can be classified into:
- Non-recursive (Simple) CTEs
- Recursive CTEs
1. Simple Common Table Expressions
Non-recursive CTEs are straightforward and do not involve self-reference. They are useful for simplifying complex queries, aggregations, and transformations by breaking them into smaller, more manageable steps.
Example: Total Salary by Department
WITH department_salary AS (
SELECT department_id, SUM(salary) AS total_salary
FROM employees
GROUP BY department_id
)
SELECT * FROM department_salary;
Here, the CTE department_salary
calculates the total salary for each department by using the SUM
and GROUP BY
functions. The main query then fetches the results from the CTE.
2. Recursive Table Expressions
Recursive CTEs are used to work with hierarchical or recursive data structures. They allow a query to reference its own output, enabling operations like traversing a tree structure or finding paths in a graph.
Example: Organization Hierarchy
Suppose we have a table named employees
with columns employee_id
, name
, and manager_id
, where manager_id
refers to the employee_id
of the employee's manager.
WITH RECURSIVE org_hierarchy AS (
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL -- Root level employees (managers)
UNION ALL
SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
FROM employees AS e
JOIN org_hierarchy AS oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy;
In this example, we define a recursive CTE named org_hierarchy
. The initial query retrieves root-level employees (managers) by selecting those with a NULL
manager_id
. The recursive part of the CTE uses the UNION ALL
clause to join the employees
table with the CTE itself, connecting employees to their respective managers using the manager_id
.
The recursive CTE is structured as follows:
- The anchor query selects the root-level employees (managers) and assigns them a level of 1.
- The recursive query selects employees who report to the managers found in the previous iteration, incrementing the level by 1.
- The final query retrieves the entire organizational hierarchy, including employees and their respective levels within the hierarchy.
Yes, recursive CTEs are confusing. I myself struggle a lot with them. It takes a long time to understand when to use them and why. 🙃
Conclusion
In conclusion, Common Table Expressions (CTEs) are powerful for enhancing the readability, maintainability, and efficiency of complex queries.
Top comments (16)
Great stuff as always. 🔥
I'm always waiting for your new posts.
Recursive Table Expressions 😢 Well explained.
Hey @karishmashukla, Nice post!
I was not fully aware about these CTEs.
It opened my curiosity 😁
Thanks!
Glad it was helpful 😁
🔥 🔥
For me you have left out the key benefits of CTEs. Namely 1) Ease of testing. You can test the logic of each CTE in isolation of the main query. This is especially useful if you have complex rules that require multiple steps
2) reusability. If you need the same logic in multiple places (e.g. you need to know the start date of a contract so that you can calculate days running, days to next review and days to end) you can calculate the values once and use them multiple times.
3) code readability. By moving complex joins, subqueries and co-related queries out of the main query you keep the query narrow and easier to read because you don't need to think about the lower level edge case logic that it held in the CTEs
I read the post again after seeing your comment and cannot understand where are these points left out. In the section "When to use CTEs" the author literally mentions "Break down complex operations into simpler steps", "Have reusable code if you need the same logic at multiple places", "Improve code readability and maintainability if your query involves subqueries, multiple joins, or intricate filtering conditions".
These are really simple things and I see no need to write them in paragraphs. As a reader it saves a lot of time and also helps me understand things quicker.
Rest of the things can be researched by the reader if really interested 😄
Having read your reply and re-read the post, yes they are mentioned in the article preface but were not explored. What is a complex query and why is it 'bad'. How does rewriting it in CTEs make it easier to comprehend, test and fix edge cases. The example given for summing dept salaries works fine as an example case to understand CTE syntax but could easily have been achieved with SUM and GROUP BY or even using SUM() OVER(), both of which are shorter code using familiar constructs. If the logic needed to exclude managers and allow for apportionment of an employee to multiple departments then lifting this to a CTE makes more sense.
Maybe a section on when NOT to use CTEs would be a valuable addition
Better to be appreciative of people who teach. Takes efforts and time on their end. :D
Where does the e in e.employees come from?
SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
FROM employees e
Here it comes from. It’s convenient to alias a table in sql where you have to chose multiple properties.
This is why I always use the longhand syntax of 'employee AS e'
It makes it clear what is going on and can avoid syntax error when you accidentally mistype the alias
True. And that is the standard way generally accepted by everyone as it provides more concise readability. Especially in larger schemas.
@shifi @wakywayne @aarone4 Totally agree. I tend to write things quickly when experimenting. Updated. Thank you :D
Nothing wrong on your part. Your example was concise as is. Complying with loose standards such as code formatting or keyword selection is pretty subjective. Good post tho. Thank you.
Thanks, embarrassing that I couldn't see that lol