If you see that a SQL SELECT query is slow - what would you do to improve it?

If a SQL SELECT query is slow, there are some simple steps we can take to identify the cause and improve its performance.

I will explain the above step by step:

1 Step - First see the plan for the execution of the query:

To do that you can use EXPLAIN or EXPLAIN ANALYZE to understand how this query is working behind the scenes.

For example, I built a simple book stores system, with 3 entities, Books, Authors, and Publisher, I ran a simple query with a merge between Books and Authors to show how it works

As you can see in the table Result 1 this command can show the query execution plan with all process in the query running it can be used to see the bottlenecks in the process, I will proposital add a bottleneck in the process for example adding a table in the query merge without foreign keys.

CREATE TABLE books_publisher (
    publisher_id SERIAL PRIMARY KEY,
    publisher_name VARCHAR(100) NOT NULL
);

ALTER TABLE books_book
ADD COLUMN publisher_id INTEGER;

INSERT INTO books_publisher (publisher_name) VALUES ('Penguin');
INSERT INTO books_publisher (publisher_name) VALUES ('Harper Collins');
INSERT INTO books_publisher (publisher_name) VALUES ('Oxford University Press');
INSERT INTO books_publisher (publisher_name) VALUES ('Pearson');

UPDATE books_book
SET publisher_id = (SELECT publisher_id FROM books_publisher ORDER BY RANDOM() LIMIT 1);

Now I run the query doing a join with publisher:

Note: The time of execution is 11.180 ms

2 Step - Verify foreign keys and indexes:

I will add a foreign key index to improve the algorithm when we run the query:

ALTER TABLE books_book
ADD CONSTRAINT fk_publisher
FOREIGN KEY (publisher_id)
REFERENCES books_publisher(publisher_id);

CREATE INDEX idx_publisher_id
ON books_book (publisher_id);

CREATE INDEX books_book_btree_publisher_id ON public.books_book USING btree (publisher_id);

After improving in DDL structure:

Note: The time of execution is 4.168 ms

3 Step - Verify what business needs, and filter data:

For example to count how many books a publisher send for each author we should do the query:

select
    books_publisher.publisher_name,
    books_author."name"  as author,
    count(*) qty
from books_book
inner join books_author 
on books_book.author_id = books_author.id
inner join books_publisher 
on books_publisher.publisher_id = books_book.publisher_id
group by books_publisher.publisher_name, books_author."name";

This query took:

We can remove the books where we did not have publishers to avoid unnecessary loops I will add a filter:

  where books_book.publisher_id is not null

It is the result:

It is a very small example, on the large datasets other things we can do are:

remove necessary data from attribute selection
do views or materialized views with your subqueries to preprocess data
organize data to follow normalization rules or unfollow (it can work too), depending on the kind of feature you want to provide to your customer
- Verify resources of the database, such as: connections opened, connections idle, CPU and RAM use.

DEV Community

If you see that a SQL SELECT query is slow - what would you do to improve it?

Top comments (0)

Read next

Using Python to Analyze Genomic Data for Personalized Medicine

Just-in-Time ⏳ Database Access - Part 1️⃣

Exploring ML Models with TensorFlow.js for Browser Applications 🚀

FinOps Consulting Services for Cloud Cost Optimization