If a SQL SELECT
query is slow, there are some simple steps we can take to identify the cause and improve its performance.
I will explain the above step by step:
1 Step - First see the plan for the execution of the query:
To do that you can use EXPLAIN
or EXPLAIN ANALYZE
to understand how this query is working behind the scenes.
For example, I built a simple book stores system, with 3 entities, Books, Authors, and Publisher, I ran a simple query with a merge between Books and Authors to show how it works
As you can see in the table Result 1 this command can show the query execution plan with all process in the query running it can be used to see the bottlenecks in the process, I will proposital add a bottleneck in the process for example adding a table in the query merge without foreign keys.
CREATE TABLE books_publisher (
publisher_id SERIAL PRIMARY KEY,
publisher_name VARCHAR(100) NOT NULL
);
ALTER TABLE books_book
ADD COLUMN publisher_id INTEGER;
INSERT INTO books_publisher (publisher_name) VALUES ('Penguin');
INSERT INTO books_publisher (publisher_name) VALUES ('Harper Collins');
INSERT INTO books_publisher (publisher_name) VALUES ('Oxford University Press');
INSERT INTO books_publisher (publisher_name) VALUES ('Pearson');
UPDATE books_book
SET publisher_id = (SELECT publisher_id FROM books_publisher ORDER BY RANDOM() LIMIT 1);
Now I run the query doing a join with publisher:
Note: The time of execution is 11.180 ms
2 Step - Verify foreign keys and indexes:
I will add a foreign key index to improve the algorithm when we run the query:
ALTER TABLE books_book
ADD CONSTRAINT fk_publisher
FOREIGN KEY (publisher_id)
REFERENCES books_publisher(publisher_id);
CREATE INDEX idx_publisher_id
ON books_book (publisher_id);
CREATE INDEX books_book_btree_publisher_id ON public.books_book USING btree (publisher_id);
After improving in DDL structure:
Note: The time of execution is 4.168 ms
3 Step - Verify what business needs, and filter data:
For example to count how many books a publisher send for each author we should do the query:
select
books_publisher.publisher_name,
books_author."name" as author,
count(*) qty
from books_book
inner join books_author
on books_book.author_id = books_author.id
inner join books_publisher
on books_publisher.publisher_id = books_book.publisher_id
group by books_publisher.publisher_name, books_author."name";
This query took:
We can remove the books where we did not have publishers to avoid unnecessary loops I will add a filter:
where books_book.publisher_id is not null
It is the result:
It is a very small example, on the large datasets other things we can do are:
- remove necessary data from attribute selection
- do views or materialized views with your subqueries to preprocess data
- organize data to follow normalization rules or unfollow (it can work too), depending on the kind of feature you want to provide to your customer
- Verify resources of the database, such as: connections opened, connections idle, CPU and RAM use.
Top comments (0)