SQL 201 (8 Part Series)
You've gathered the requirements, checked out the tables, joined all the things and you're ready to push execute.
The query runs.
Before you rage quit or call in your DBA, check these things aren't slowing things down.
1 - Timing is everything
2 - Locked out
3 - You don't need ALL the things
4 - Uppers and Lowers
5 - Not, NOT IN
6 - To CTE or not to CTE
7 - Wild, wild, wildcards
8 - Try a table
9 - Views on views on views
10 - Indexes
This is written with PostgreSQL in mind but if you're using something else, these may still help, give them a try.
Before we go any further. How busy is the database? Are you running a complex query at a peak time when there are multiple users competing for memory?
You can do this by asking the database ... not by shoulder tapping everyone in the company.
This may have been locked down by your friendly DBA so results may vary
select * from pg_stat_activity /* This lists all queries currently running and idle. */ select pid, usename, pg_blocking_pids(pid) as blocked_by, query as blocked_query from pg_stat_activity where cardinality(pg_blocking_pids(pid)) > 0; /* This lists queries causing locks. */
Is the table being updated? If you've crashed into a table that is being updated by an ETL process you may be blocking the update and won't get far in running your query.
Have a chat with the team doing ETL and make sure you know when the update window is.
If you've got the all clear, then we can start pulling your query apart to see if there are any tweaks that could be made before using the query planner or the DBA.
Starting at the top ... Do you really need to SELECT * ?
If you are looking for an idea of what is in the table, try expanding the column list in the schema tree.
Pull out the names of only the columns you need instead of using SELECT * to speed things up further.
If you have an especially large table or wide table, the query engine is going to struggle to pull back absolutely everything into the client side. Unless you are going to 'eyeball' every row make sure you are using 'LIMIT' to restrict the results.
If you are looking for a COUNT, rather than running the query to spot the count of rows at the bottom of the screen, use a subquery that counts the rows for you.
select count(*) from ( select userid from userschema.usertable where market = 'UK' and payingcustomer = 1 ) as derivedtable;
PostgreSQL is case sensitive which can take some getting used to if you've come from SQL Server.
If you are 'lowering' or 'uppering' your data this is labour intensive. Only do it if it is absolutely required. Check out what your data looks like first before arbitrarily adding it to your query.
If it is necessary on a join, use it on one side only or try using ILIKE for a case insensitive match.
Try to avoid the use of ‘IN’ or 'NOT IN'. By doing this you are performing a full table scan as the query engine looks through every row to check if the condition is met.
- Try to use 'EXCEPT' or 'NOT EXISTS' these have a significantly smaller impact on the query plan than 'NOT IN'.
While CTEs are easier to read than subqueries, in PostgreSQL they prevent the query optimiser from rewriting queries by moving constraints into or out of the CTE.
- Both CTEs and subqueries are useful, and one or the other may perform better in a specific case. Consider the table size, the number of rows likely to be returned and the actions being performed in the CTE when writing it.
Using wildcards at the start and end of a LIKE will slow things down. And potentially give you more results than you intended.
- Use the wildcard only where you need them, generally, this will only be required on one side so be conscious of what you are asking the query engine to do.
select firstname, lastname, userid from userschema.usertable where lastname like 'anders%' -- only on the end of the string
Running several nested queries as a Function is costly and could be quicker if you were to write into a table.
- Consider creating staging tables if there are a lot of steps to your process so you are joining to a smaller subset of data.
Views are queries that run when you access the view. If you are calling multiple views, or worse, views on views on views you are asking the query engine to run multiple queries to return your columns.
Write to a table if you need to take a snapshot each day/week/month rather than using views to filter.
If you are using nested views consider if there is a more direct way to get to the columns you need by writing a query, instead of causing multiple queries to run to get to the required columns from the final nested view.
Indexes speed up your query by ordering the data so the database engine knows where to look, or giving it a lookup table so it can use that to know where to look. The type of index you use determines which way the index works.
Apply indexes to columns you use often in your queries with high cardinality or rates of change.
Take stock of your indexes to make sure you don't have too many on a table.
Hope these tips help with your query tweaking. Every database is different so as always when it comes to the best strategy for your query, it depends.
What strategies do you use to tweak your queries before you shoulder tap your DBA and use and interpret the Query Plan?
This post first appeared on helenanderson.co.nz
Photo by icon0.com on Pexels