Arctype writer: Daniel Lifflander
- Intro to CTEs
- Organizing Complex Queries
- Unlocking the Power of CTEs with Recursion
- Recursive CTEs in Action
It's easy to get lost in a maze of subqueries, derived tables, and temporary tables when working with complex SQL queries. Postgres' common table expressions, or CTEs, are a lesser known feature, but they can be a useful structuring tool to craft a SQL query in a more readable and friendly way.
CTEs begin as a
with statement and allow you to execute a query whose results will be available later to subsequent queries. If you have a query that relies on the results of another query, a reference of sorts, the CTE will allow Postgres to materialize the results of the reference query, ensuring that the query is only run once and its results are readily available to other queries farther down the line. (Note that this can be a double-edged sword, as it prevents the Postgres query planner from running optimization outside of the reference query’s SQL.)
The sequential list of "code blocks" created by CTEs adds readability to SQL scripts because every block is dependent on the ones above it. In contrast, it can be difficult to trace dependencies between derived tables and parse through nested subqueries (especially as their depth increases).
Derived Table Example
Temporary Table Example
This will require two separate queries.
Note that the query above depends on a, which is listed first with the CTE, but as a derived table it is listed inside the query. CTEs allow you to logically order queries that are dependent on others. As queries get more complex, organization becomes key to reducing errors. CTEs allow you to organize code into distinct blocks with a clear dependency hierarchy.
You can also make CTEs more than syntactic sugar by adding the
recursive keyword! Once you add the recursive keyword, Postgres will now allow you to reference your CTE from within itself, allowing you to “generate” rows based on recursion.
Here’s an example using Postgres CTEs to implement the Fibonacci sequence:
Running this, we’d get the first 10:
It's probably not too often we find ourselves in a situation where we need Fibonacci numbers in SQL. Let’s continue on and take a look at a real problem I solved using this Postgres feature.
In the manufacturing industry, a bill of materials (or BoM) is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, parts, and the quantities of each needed to manufacture an end product … BOMs are of hierarchical nature, with the top level representing the finished product which may be a sub-assembly or a completed item. BOMs that describe the sub-assemblies are referred to as modular BOMs
More simply, a BoM is a list of parts that a product is assembled from. Each part may consist of other parts, creating hierarchy. A car is a final product that is composed of many parts. A door is an example of a part of a car, its subparts being among a window, handle, various switches, and so forth. Recursive CTEs are great for hierarchical data! Let’s take a look at how we could model this hierarchy and use CTEs to generate a modular BoM sheet.
part table stores our parts.
part_hierarchy table stores a simple parent to child reference between parts, as well as a quantity if the part happens to need more than 1 of the child part for assembly. The check constraints prevent a part from referencing itself circularly.
Let's insert some parts
And some relations, approximately resembling the car example mentioned above
In our database we now have 5 parts and some relationships between them. If we wanted to see how many direct subassemblies a door has, we could query something like this, though it would only show us direct descendants:
Finally, let's now construct a query to list all of the parts and their subassemblies of our car.
The union all is the key part to this CTE. The first part of this UNION selects the base part, the second half selects any child parts by joining to the part_hierarchy table and calling itself (inducing recursion), then combines those results with the parent via the union. Without recursion, you'd have to add a join for every level of the hierarchy you needed to support.
The level value is introduced here to track how deep the recursion has run. It is hard coded at 0 to represent the parent. As the query "unwraps" and it calls itself, this increases by 1 to indicate how many generations far from the parent part the child part is.
The very last part that selects from the CTE pads the name with spaces based on the level, which is a quick and dirty way to visualize the hierarchy.
From here, you could use aggregate functions and rollup to calculate the costs of specific assemblies.
This example is definitely simplified from what I ended up using to actually solve this problem. Supporting part versioning and branching and other business requirements quickly complicates what this solution would look like in the real world. Recursive CTEs in SQL are not the only way to store and query hierarchical data, but for certain data types, sizes, and speed requirements, they are a quick and convenient feature to have in your wheelhouse.
Now you're one step closer to becoming a SQL guru, but don't let an outdated SQL editor hold you back. Check out Arctype today and experience the modern SQL editor.