DEV Community

Karishma Shukla
Karishma Shukla

Posted on • Updated on

Simplify Complex SQL Queries with Common Table Expressions (CTEs)

What are Common Table Expressions?

Common Table Expressions (CTEs) are a valuable feature in SQL that lets you create temporary result sets within a query. They simplify complex queries, enhance code readability, and improve query performance. CTEs are initiated using WITH keyword.

CTE Syntax
Fig: CTE Syntax. Image from MariaDB

When to use CTEs?

CTEs are particularly useful to:

  • Break down complex operations into simpler steps
  • Handle hierarchical data structures
  • Implement pagination for large result sets
  • Streamline complex aggregation tasks
  • Have reusable code if you need the same logic at multiple places
  • Improve code readability and maintainability if your query involves subqueries, multiple joins, or intricate filtering conditions

Types of CTEs

Broadly CTEs can be classified into:

  • Non-recursive (Simple) CTEs
  • Recursive CTEs

1. Simple Common Table Expressions

Non-recursive CTEs are straightforward and do not involve self-reference. They are useful for simplifying complex queries, aggregations, and transformations by breaking them into smaller, more manageable steps.

Example: Total Salary by Department

WITH department_salary AS (
  SELECT department_id, SUM(salary) AS total_salary
  FROM employees
  GROUP BY department_id
)
SELECT * FROM department_salary;
Enter fullscreen mode Exit fullscreen mode

Here, the CTE department_salary calculates the total salary for each department by using the SUM and GROUP BY functions. The main query then fetches the results from the CTE.

2. Recursive Table Expressions

Recursive CTEs are used to work with hierarchical or recursive data structures. They allow a query to reference its own output, enabling operations like traversing a tree structure or finding paths in a graph.

Example: Organization Hierarchy

Suppose we have a table named employees with columns employee_id, name, and manager_id, where manager_id refers to the employee_id of the employee's manager.

WITH RECURSIVE org_hierarchy AS (
  SELECT employee_id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL  -- Root level employees (managers)

  UNION ALL

  SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
  FROM employees AS e
  JOIN org_hierarchy AS oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy;
Enter fullscreen mode Exit fullscreen mode

In this example, we define a recursive CTE named org_hierarchy. The initial query retrieves root-level employees (managers) by selecting those with a NULL manager_id. The recursive part of the CTE uses the UNION ALL clause to join the employees table with the CTE itself, connecting employees to their respective managers using the manager_id.

The recursive CTE is structured as follows:

  • The anchor query selects the root-level employees (managers) and assigns them a level of 1.
  • The recursive query selects employees who report to the managers found in the previous iteration, incrementing the level by 1.
  • The final query retrieves the entire organizational hierarchy, including employees and their respective levels within the hierarchy.

Yes, recursive CTEs are confusing. I myself struggle a lot with them. It takes a long time to understand when to use them and why. 🙃

Conclusion

In conclusion, Common Table Expressions (CTEs) are powerful for enhancing the readability, maintainability, and efficiency of complex queries.


If you like what you read, consider subscribing to my newsletter.
Find me on GitHub, Twitter

Top comments (16)

Collapse
 
patrickcodes profile image
Nishant

Recursive Table Expressions 😢 Well explained.

Collapse
 
richarddev_44 profile image
Richard

Great stuff as always. 🔥
I'm always waiting for your new posts.

Collapse
 
ninjaprogrammer profile image
SP

🔥 🔥

Collapse
 
artu_hnrq profile image
Arthur Henrique

Hey @karishmashukla, Nice post!
I was not fully aware about these CTEs.
It opened my curiosity 😁
Thanks!

Collapse
 
karishmashukla profile image
Karishma Shukla

Glad it was helpful 😁

Collapse
 
aarone4 profile image
Aaron Reese

For me you have left out the key benefits of CTEs. Namely 1) Ease of testing. You can test the logic of each CTE in isolation of the main query. This is especially useful if you have complex rules that require multiple steps
2) reusability. If you need the same logic in multiple places (e.g. you need to know the start date of a contract so that you can calculate days running, days to next review and days to end) you can calculate the values once and use them multiple times.
3) code readability. By moving complex joins, subqueries and co-related queries out of the main query you keep the query narrow and easier to read because you don't need to think about the lower level edge case logic that it held in the CTEs

Collapse
 
patrickcodes profile image
Nishant • Edited

I read the post again after seeing your comment and cannot understand where are these points left out. In the section "When to use CTEs" the author literally mentions "Break down complex operations into simpler steps", "Have reusable code if you need the same logic at multiple places", "Improve code readability and maintainability if your query involves subqueries, multiple joins, or intricate filtering conditions".

These are really simple things and I see no need to write them in paragraphs. As a reader it saves a lot of time and also helps me understand things quicker.

Rest of the things can be researched by the reader if really interested 😄

Collapse
 
aarone4 profile image
Aaron Reese

Having read your reply and re-read the post, yes they are mentioned in the article preface but were not explored. What is a complex query and why is it 'bad'. How does rewriting it in CTEs make it easier to comprehend, test and fix edge cases. The example given for summing dept salaries works fine as an example case to understand CTE syntax but could easily have been achieved with SUM and GROUP BY or even using SUM() OVER(), both of which are shorter code using familiar constructs. If the logic needed to exclude managers and allow for apportionment of an employee to multiple departments then lifting this to a CTE makes more sense.
Maybe a section on when NOT to use CTEs would be a valuable addition

Thread Thread
 
patrickcodes profile image
Nishant
  • This is a blog post. Not Google. Not a book. It is not the writer's job to teach everything, right?
  • If someone does not understand what complex queries are, then can read this in the future. It is not wrong to expect your audience to have some level of knowledge. I see no beginners tag used anyway.
  • Complicated examples are always bad. Best to understand with something simple and explore on your own.

Better to be appreciative of people who teach. Takes efforts and time on their end. :D

Collapse
 
wakywayne profile image
wakywayne

Where does the e in e.employees come from?

Collapse
 
shifi profile image
Shifa Ur Rehman

SELECT e.employee_id, e.name, e.manager_id, oh.level + 1
FROM employees e

Here it comes from. It’s convenient to alias a table in sql where you have to chose multiple properties.

Collapse
 
aarone4 profile image
Aaron Reese

This is why I always use the longhand syntax of 'employee AS e'
It makes it clear what is going on and can avoid syntax error when you accidentally mistype the alias

Thread Thread
 
shifi profile image
Shifa Ur Rehman

True. And that is the standard way generally accepted by everyone as it provides more concise readability. Especially in larger schemas.

Thread Thread
 
karishmashukla profile image
Karishma Shukla

@shifi @wakywayne @aarone4 Totally agree. I tend to write things quickly when experimenting. Updated. Thank you :D

Thread Thread
 
shifi profile image
Shifa Ur Rehman

Nothing wrong on your part. Your example was concise as is. Complying with loose standards such as code formatting or keyword selection is pretty subjective. Good post tho. Thank you.

Collapse
 
wakywayne profile image
wakywayne

Thanks, embarrassing that I couldn't see that lol