It’s a common misperception that CTE queries are staged in memory or are “persisted data sets”. Not true. They’re simply syntactic sugar that SQL will in-line into your query. CTEs make queries easier to read but do not offer a performance benefit. They’re more like ad-hoc views than temp tables.
I made an account to say this. It's a really important distinction and will murder your performance sometimes.
I've seen the optimizer treat a CTE as a correlated subquery executed for every row in a nested loop execution plan.
CTE's are very easy to read, though. It is my default strategy for writing complex queries, and I'll only switch to #temptables if I see a performance issue with the CTE.
Also, sometime SQL Server's in-memory table variables can be useful. They are limited by memory but fast and they require no special permissions (I think?).
DECLARE @employee TABLE
EmployeeID INT PRIMARY KEY NOT NULL,
INSERT INTO @employee VALUES (1, 'Jakob')
SELECT * FROM @employee
Table variables are not in memory only objects. They are logged in temodb. Table variables only last for execution and don't generate statistics.
Thanks for the clarification regarding tempdb. It seems that as of SQL Server 2017 you can create in-memory table variables.
Very much RDBMS dependent in this respect, and I appreciate that this article and the responses focus on SQL Server.
In Oracle a CTE can be persisted by the query optimiser, and the decision is likely to be influenced by its expected size. The data is written to a temporary tablespace. Of course it is also unindexed.
Thank you for this post Helen!
I think CTEs are not that used, at least by web devs, because they don't fit well the ORM fairy tale in which data is mapped to objects.
I understood their power better by working side to side with data scientists using Redshift, kudos a #dataenthusiast like yourself!
Subqueries are still valuable if the relationship between the two queries is simple and the subquery is short but CTE are way better! Giving a name to a query can make all the difference.
The same goes for views I guess. I suspect that Heroku's Dataclips are built with something like that behind the scenes!
Another reason why CTEs are not that popular is that MySQL did not support recursive CTEs till recently (version 8).
Using CTEs on Postgres made a lot of complex queries easy to maintain and explain to team members.
Thanks for your response, glad you found it useful.
Great to hear you are working with a data scientist too. :)
I didn't realize that this was a recent article as I had just stumbled onto it over the weekend. It gave me an idea on a troublesome query that I was working on last Fri. I abandoned my convoluted subquery and re-wrote it using 2 CTE's as in your example. I produced the results that I wanted within 10 minutes. Thanks for the great article.
That's fantastic feedback, thanks so much Alan, good to hear it helped.
It's worth noting that in PostgreSQL a CTE is an "optimiser fence", in that the query that is contains is executed without further optimisation. So if you apply a further predicate to the CTE result in another CTE or the main query, this predicate isn't pushed down to the original CTE.
On the other hand, in PostgreSQL a CTE can execute an insert, update or delete, and return values from the affected rows for use later on in the same query, which is very powerful.
Note - CTEs are no longer an optimizer fence in PostgreSQL 12
Wow, that's a game changer. Thanks for sharing
As a SQL Developer I always use CTE's if I need to work with 3 or more filtered datasets. They are more readable then subqueries but more importantly they are still understandable 3 months later, or 3 years later. They are also much easier to debug than a subquery, easier to repurpose when you want to chase down a possible insight in the data, and help cut down on skewed result sets and aggregations.
I wouldn't necessarily recommend them for line of business queries that live in the backend of an application since they aren't as common to a web dev for example. Use the tools with the most intuitive familiarity in the backend. But when we're talking about reporting applications that pull data from a data warehouse CTE's can be a much better easier to sustain query writing option across data professionals.
I did not know this was a thing, thank you helen. For those wondering if this applied to MySQL as well, yes. dev.mysql.com/doc/refman/8.0/en/wi....
Thanks for the link David, I should have clarified I see the world through a SQL Server lense at the moment :)
Great article :)
A strategy I found useful was making views out of very scary looking queries / queries with rather involved business logic. It can be tested/debugged and then be used for querying without further worries. Also, Postgres's \dv makes it easy to list views.
I use CTEs quite a lot.. cause when you are scrapping report from DWH, you won't be having write access to different database.. so you can't create tables to store data.. much easier option is CTE than subquery when you are extracting data from multiple tables and joining together.. if you use subquery, it gets very ugly and hell lot of difficult to do any changes afterwards.. hence I love CTEs
I think you may be giving CTEs a little too much credit, at least as implemented in SQL Server. As the name implies they are expressions. They are not necessarily persisted, though the compiled plan may spill the results to tempdb. They neither help nor hinder performance on their own. What they are (except for the recursive variety) is a way to write sub queries ahead of the main query instead of after it. Of course you can also refer to a cte multiple times in the main query if you need to (or another cte or subquery.) Very handy! In general though, writing:
c1 as (select ...),
c2 as (select ...),
from c1 join c2 ...
compiles the same as
from (select ...) as c1
join (select ...) as c2
Officially, A CTE "Specifies a temporary named result set". Of course that's exactly the same as a sub query. after the compiler is finished building the syntax tree from a CTE-based query and the same semantic query using sub queries, the results are the same. Which is why the execution plans are usually the same.
I use CTEs extensively in my work and have for years. I like them because they help make code more readable. I find it much easier to scan a linear set of CTEs than a heavily nested set of sub queries. Call it personal preference, if you like.
Thanks for the post, Helen. I do like CTE's and use them often but it's important to note that when CTE results in a large dataset the query performance dies pretty quickly. If the CTE result set goes over 10k rows I'd replace it with a local temp table. Otherwise, definitely better than nested queries.
There are many great uses for CTE, my only issue is performance. I see a lot of junior developers use them when an “exists” or temp table might be a more effective use. That make sense? Enjoyed reading the article
Haven't yet been acquainted with CTE's. Sounds useful but for queries like these examples the only difference to me is naming sets first instead of while you're defining them. I always build the sets first for a query, only initially selecting the count. After joining / filtering and sets are verified, I go back and select the data of interest - typically I would then convert it to an insert/update/delete.
I hadn't considered that CTEs are also a nice way to alias sub datasets in general cases. Thanks. I can already see a few of my reporting queries that would benefit from being organized in this way.
I find myself only reaching for CTEs when I'm using them recursively.
The one thing I would contribute is that if it is solely about naming, you are naming subqueries, as well, just usual convention keeps those names short. I have never really used CTEs, and the times I have seen them in other developers work, they have been used once and thrown away. The way you discuss them here, though implies I can reuse them. Thank you for the article, it caused me to rethink this...
All my objections to the article were covered by other respondents, save one.
Is it really a valid argument that CTE improves readability? There are serious limitations of that functionality...
I'm not pro nor con CTE - as everything it has it's uses, but more often than not I encounter a POV - mostly from analysts of all sorts and stripes - that states: "CTE is easier to read than subquery" or similar. I don't believe that this ever should be any serious argument for (or against) any solution, let alone one that can - if written badly - bring entire server to it's knees...
I also recently encountered a POV that one should ditch temp tables in favor of CTEs because "temptables are outdated".
My personal opinion is that if you use either CTE, subquery or Temptable to make complicated code more readable or manageable, and you are an analyst, then your query could use overhaul...
People reading this post, please be wary.
CTEs will not always materialize (store in memory) the intermediate results.
For example, if you join the results of a CTE, you may Infact execute that CTE for each row in the joined table.
Also, for SQL Server, attempts to force materialization may not always work.
There are a lot of comments about de error on CTEs and in-memory, and I will ignore this point.
About joins and subqueries (a CTE is a subquery):
For these queries, you can add your where to the join. This way, the database will do the filtering before the join. Doing this, you can achieve the same performance with less code.
After seeing many with clauses in the scripts in my project, even I tried to getting used to it (without knowing the power of cte). I had a chance to work with an oracle architect. That time he made me realize the power of cte. He tunned a query which was running 250 seconds long to just 2 seconds. After going through the modifications I learned something. With that after some days I tunned a query which was running for more than an hour to 8 minutes. It's very very powerful tool if it is used properly.
Thank you for this post Helen!
Actually This is the SQL CTEs introduction I Wish I Had when I started working with them 2 years ago.
Love using ctes, perhaps too much. Yet getting rid of scalar function calls in our sql codebase made these a necessity it was either that or table values functions-which are great if you need the content in multiple places and wish to preserve the rules used in one place.
A good case for me is the current quote system on a project I'm doing right now. I needed to get the rules and calculations exposed in a single query. The only way I could do that, making it readable as well, was one with about 6 Ctes. Several hold summation tables, another holds valid quotes and lines - I don't need to select out non deleted items more than once, I just use a CTE called validQuotes.
As ever I have to look at the query plan, see if anything looks super slow and rewriting. Generally I get excellent performance.
Very cool! I've never used CTE's because I'd never heard of them until now, so thanks for the educational post!
You're welcome, glad it was useful :)
I was not aware of CTE's ... thanks for the simple example.
Excellent article Helen, pleased to see you also answered the why not use subqueries instead. I love using them as you mentioned in your post to keep things readable and reusable in the query.
Thanks Mark. They've been a hot topic around the office recently so thought they deserved a bit of love and a post to support my case.
In the end of the days, on the edges, and with the impossible challenges, CTEs will help you.
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.