SQL 201: 10 Ways to Tweak Slow Running Queries

Helen Anderson on April 02, 2019

You've gathered the requirements, checked out the tables, joined all the things and you're ready to push execute. The query runs. and runs. and ... [Read Full]
markdown guide
 

The system I'm dealing with day-to-day right now is full of nested views. 😜

It's really hard to figure what's going on!

I think readability and being able to reason about what's going on is another big benefit to using staging tables as you mentioned.

Something I remember from database 101 is to limit your use of cursors. In my first job, there were cursors everywhere. An easy fix for these, if they tended to cause issues, was to just do something other than use cursors 😋.

Great article! 🤜🤛

 

Thank you James :)

I'm also finding it really hard to untangle nested views right now. I know there is a column I need from somewhere down through the spaghetti mess but it's getting harder and harder to split that out from the noise around it.

I'm learning more and more that it's important to reassess why views are important and if they are so important, why they aren't a table in their own right.

The more I delve into databases in the real world the more pragmatic I become, but at the same time know things could be better

 

I think that's exactly why more senior devs tend to lean on the "keep it really simple" side of the spectrum. We've had first-hand experience with what happens when you don't!

I find most times the culprit is an attempt to make everything "re-usable". But whether it's database code or back-end code, etc. that's a principle that shouldn't be pushed so hard.

I'd much rather deal with the same logic copied and pasted in 3 different places than 3 levels of nested views, etc.

 

You could make a lifelong career by taking this one article out on the road and doing a series of day-long workshops on these tips and advice in companies, large and small, all over the world. Seriously.

I'm laughing away at #3 as memories of some queries I've QAed in the past come back to me. SELECT * on whole multi-year monthly snapshot tables which are aliased and then joined back to themselves with no WHERE clause on either side. And people wondered why it took hours to run (and eventually fall over).

 

SELECT * on whole multi-year monthly snapshot tables which are aliased and then joined back to themselves with no WHERE clause on either side. And people wondered why it took hours to run (and eventually fall over).

OMG! :D

 

WOW! That makes my head spin.

The more I delve into database best practice in the real world, the more tricky it becomes.

I want to fix all-the-things but at the same time, there is the pressure that something should be consistently-wrong. As much as it makes my skin crawl that something isn't right, I get that you have to roll with it becomes something has been done 'slightly wrong' forever ... even if it takes forever to run.

 

Very concise and useful tips! I use CTEs all the time for readability and didn't realize it could be impacting performance. Do you have a recommendation for where you learned the "why" behind these tips? A little context/background can help make things like this a bit easier to remember.

 

I like using CTEs to make the query easier to read but it was pointed out to me when writing my article on CTEs (from a SQL Server point of view) that these can
be an optimisation fence in Postgres.

It's worth noting that in PostgreSQL a CTE is an "optimiser fence", in that the query that is contains is executed without further optimisation. So if you apply a further predicate to the CTE result in another CTE or the main query, this predicate isn't pushed down to the original CTE.

On the other hand, in PostgreSQL a CTE can execute an insert, update or delete, and return values from the affected rows for use later on in the same query, which is very powerful.

Read more about the why with examples:

 
 

Great article. Definitely didn't know about the performance difference between NOT EXISTS and NOT IN.

Could you elaborate on this? Thank you!

 

Yes! Absolutely, I'll write up some real-life examples and come back to you :)

 

Hey rhymes apologies for the late reply.

re: NOT EXISTS v NOT IN

I came across this while reviewing others code and refactoring it for a migration we did recently. It's not something I had come across until moving to a Postgres DB.

I like the explanation from the PostgresWiki copy and pasted here:

Don't use NOT IN

Don't use NOT IN, or any combination of NOT and IN such as NOT (x IN (select…)).

(If you think you wanted NOT IN (select …) then you should rewrite to use NOT EXISTS instead.)

Why not?

Two reasons:

1 - NOT IN behaves in unexpected ways if there is a null present:

select * from foo where col not in (1,null); -- always returns 0 rows
select * from foo where col not in (select x from bar);
  -- returns 0 rows if any value of bar.x is null

This happens because col IN (1,null) returns TRUE if col=1, and NULL otherwise (i.e. it can never return FALSE). Since NOT (TRUE) is FALSE, but NOT (NULL) is still NULL, there is no way that NOT (col IN (1,null)) (which is the same thing as col NOT IN (1,null)) can return TRUE under any circumstances.

2 - Because of point 1 above, NOT IN (SELECT ...) does not optimize very well. In particular, the planner can't transform it into an anti-join, and so it becomes either a hashed Subplan or a plain Subplan. The hashed subplan is fast, but the planner only allows that plan for small result sets; the plain subplan is horrifically slow (in fact O(N²)). This means that the performance can look good in small-scale tests but then slow down by 5 or more orders of magnitude once a size threshold is crossed; you do not want this to happen.

When should you?

NOT IN (list,of,values,...) is mostly safe unless you might have a null in the list (via a parameter or otherwise). So it's sometimes natural and even advisable to use it when excluding specific constant values from a query result.

Thank you Helen! I really didn't know about this shortcoming. The failure to optimize makes sense given the issues with NULL. I'll definitely keep this in mind :)

The quirks of moving from database to database. I'm sure there will be more of these that cause queries to grind to a halt. :D

 

Helen excellent post! in my work, every day i try to find posts like this. I will implement in my everyday.

 

hi,
Am using MySQL server and PHP for the programming side.
I have a table which stores only the id's and bar code of my products, and this table grows everyday,
when I scan a bar code from my front-end, I search for the barcode from this table first, with the ID’s obtained from this table I run another query and pass these ID’s into another table using IN condition and get the information of my products. Some of my products have the same bar-code with different description, hence I use this method.
It is holding good till now with 6000 codes, but will it be functional as it grows or do I need to optimise it in another way.
The speed and the number of rows seems good till now, I do a regular EXPLAIN condition to check if the query is checking unwanted rows.

 
 

I don't have a login for the platform but sure, if you credit it back to here on dev.to no problem :)

 
 

Hi Madam,
all article super and more helpful for every software engineer.

 

Very nice article! This brings back memories of poring over execution plans and reporting findings to a supplier in an attempt to get a system to pass its performance requirements.

 
 

(A) this is a wonderful article. Definitely needed by many. (B) next stop is cardinality !

 

Thanks so much! I have a few up my sleeve in the SQL 201 category, cardinality is one of those I'd like to explore.

 

Pretty good tips Helen and some of them not-that-obvious. Do you plan to write one list for noSql dbs too?

 

Yes! I am making notes as I delve into this world and am planning on writing an article on this soon.

Thank you for the kind comments :)

code of conduct - report abuse