When working with databases, optimizing SQL queries is essential to achieve better performance and reduce response times. In this post, we'll cover some common bad practices in SQL queries and provide examples of how to write queries that perform better
π Bad Practice 1: Using SELECT *Β
One of the most common mistakes when writing SQL queries is using the SELECT *
statement to fetch all columns from a table. This can cause performance issues, especially when working with large tables, as it requires the database to read all columns, even if they're not needed.
π Good Practice: Selecting Specific Columns
Instead of using SELECT *
, it's better to select only the columns that are needed in the query. This reduces the amount of data that needs to be read, improving performance. For example:
-- π΄ Bad Practice
SELECT * FROM users WHERE id = 1;
-- π’ Good Practice
SELECT name, email FROM users WHERE id = 1;
π Bad Practice 2: Using Subqueries in SELECTΒ
Another common mistake is using subqueries in the SELECT
statement. Subqueries are queries inside other queries, and they can be expensive to execute, especially when the outer query returns many results.
π Good Practice: Using JOINs Instead of Subqueries
Instead of using subqueries, it's better touse JOINs to combine data from multiple tables. This can be more efficient, as JOINs can be optimized by the database engine. For example:
-- π΄ Bad Practice
SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count
FROM users;
-- π’ Good Practice
SELECT users.name, COUNT(orders.*) AS order_count
FROM users
JOIN orders ON users.id = orders.user_id
GROUP BY users.id;
In this example, we're calculating the number of orders for each user. The bad practice query uses a subquery in the SELECT
statement, while the good practice query uses a JOIN and a GROUP BY clause to achieve the same result.
π Bad Practice 3: Using LIKE with Leading Wildcards
Using the LIKE
operator with leading wildcards (%
) can cause performance issues, as it requires a full table scan to find matching rows.
π Good Practice: Using Indexes and Trailing Wildcards
To improve performance when searching for patterns in text fields, it's better to use indexes and trailing wildcards (%
) instead of leading ones. This allows the database to use the index to find matching rows, rather than performing a full table scan. For example:
-- π΄ Bad Practice
SELECT * FROM products WHERE name LIKE '%apple%';
-- π’ Good Practice
SELECT * FROM products WHERE name LIKE 'apple%';
In this example, the bad practice query uses a leading wildcard, which cancause a full table scan. The good practice query uses a trailing wildcard, which allows the database to use an index to find matching rows more efficiently.
π Bad Practice 4: Using ORDER BY with Large Result Sets
Using the ORDER BY
clause with large result sets can be inefficient, as it requires the database to sort all rows returned by the query.
π Good Practice: Using LIMIT with ORDER BY https://dev.mysql.com/doc/refman/8.3/en/limit-optimization.html
To improve performance when sorting large result sets, it's better to use the LIMIT
clause with ORDER BY
so that the database only sorts the necessary rows. For example:
-- π΄ Bad Practice
SELECT * FROM products ORDER BY price;
-- π’ Good Practice
SELECT * FROM products ORDER BY price LIMIT 100;
In this example, the bad practice query orders all products by price, while the good practice query orders the first 100 products by price.
π Bad Practice 5: Using COUNT(*) with Large Tables
Using the COUNT(*)
function with large tables can be inefficient, as it requires the database to count all rows in the table.
π Good Practice: Using COUNT(1) or COUNT(column)
To improve performance when counting rows, it's better to use COUNT(1)
or COUNT(column)
instead of COUNT(*)
, as they can be optimized by the database engine. For example:
-- π΄ Bad Practice
SELECT COUNT(*) FROM users;
-- π’ Good Practice
SELECT COUNT(1) FROM users;
In this example, thebad practice query counts all rows in the users
table, while the good practice query uses COUNT(1)
to achieve the same result more efficiently.
π Bad Practice 6: Using DISTINCT with Large Result Sets
Using the DISTINCT
keyword with large result sets can be inefficient, as it requires the database to sort and remove duplicates from all rows returned by the query.
π Good Practice: Using GROUP BY Instead of DISTINCT
https://dev.mysql.com/doc/refman/8.3/en/distinct-optimization.html
To improve performance when removing duplicates, it's better to use the GROUP BY
clause instead of DISTINCT
, as it can be optimized by the database engine. For example:
-- π΄ Bad Practice
SELECT DISTINCT category FROM products;
-- π’ Good Practice π
SELECT category FROM products GROUP BY category;
In this example, the bad practice query removes duplicates from all categories in the products
table, while the good practice query groups the rows by category to achieve the same result more efficiently.
π Bad Practice 7: Using OR in WHERE Clauses
Using the OR
operator in WHERE
clauses can be inefficient, as it can force the database to perform a full table scan to find matching rows.
π Good Practice: Using UNION or IN Instead of OR
To improve performance when filtering rows, it's better to use UNION
or IN
instead of OR
, as they can be optimized by the database engine. For example:
-- π΄ Bad Practice
SELECT * FROM products WHERE category = 'fruit' OR category ='dairy';
-- π’ Good Practice π
SELECT * FROM products WHERE category IN ('fruit', 'dairy');
In this example, the bad practice query uses OR
to find rows that match either 'fruit' or 'dairy', while the good practice query uses IN
to achieve the same result more efficiently.
Conclusion:
By selecting specific columns in SELECT
statements, using JOIN
s instead of subqueries, using indexes with trailing wildcards, using LIMIT
with ORDER BY
, using COUNT(1)
or COUNT(column)
instead of COUNT(*)
, using GROUP BY
instead of DISTINCT
, and using UNION
or IN
instead of OR
, you can improve query performance and make your applications more responsive.
Remember, There is more then 7 practices that could improve writing SQL query much better, like Using Indexes, these are just some examples, and there are many other ways to optimize SQL queries. Always analyze your queries and use database-specific features and tools to improve performance.
Top comments (4)
ORDER BY and LIMIT still sorts all the records, otherwise you'd never be able to do paged sorted lists.
Not sure how it can be "bad practice" to use a %searchitem% if that's how the data is! Presumably a good idea to use other ANDed statements to try to reduce the list so that the final scanned set is as limited as possible - presuming such limitations are viable in the scenario.
I'm confused about your DISTINCT example, definitely not true for MySQL as MySQL can optimize DISTINCT queries by using an index to quickly find and remove duplicate rows. This can be a significant performance improvement for queries that return a large number of rows.
MySQL cannot optimize GROUP BY queries in the same way, because the GROUP BY clause requires MySQL to calculate aggregate functions for each group of rows. This can be a significant performance penalty for queries that return a large number of rows.
In general, you should use the DISTINCT clause whenever you only need to return unique rows from a table. You should use the GROUP BY clause whenever you need to calculate aggregate functions for groups of rows.
Simply the database engine gets the first 10 rows ASC/DESC, so no need to sort 100,000 rows, when the SQL engine will find 10 will stop
limit-optimization
MySQL is really smart and can figure out when you only want a few rows from a big table. When you use LIMIT, sometimes MySQL can use a special trick to find those rows really fast, If you use LIMIT with an ORDER BY clause, MySQL will stop sorting the rows as soon as it finds enough rows to show you. This is really fast if the table is already sorted in a certain way
DISTINCT
andGROUP BY
- Sometime the scenario change, but the behavior and implementation is different - DISTINCT uses temp table, while GROUP BY uses Hashdistinct-optimization
group by and distinct
@miketalbot
ORDER + LIMIT sounds like it would be of limited use. Usually with ORDER, I'm looking for the "Top X" of something, but always out of the ENTIRE population. I can't think of a scenario where I'd be interested in the "Top X" of a random sample of results. So the query may be optimized, no argument there - but I would not offer this as standard advice because the utility is so limited outside of testing scenarios.
In my experience ORDER + LIMIT is of common use and it's a good practice, if the sorting column is an ordered primary key (which is a very good practice). For example, if you need the records created by a user in the last week, but not more than 10, instead of
and then filter out the extra results in the backend code, you can do
It's not only more efficient because it skips the records not needed, but using the DESC order it scans the rows from the last created and it stops when it reach the 10th matching result. Without the DESC order it has to scan all the first records created by the user. The only downside is that the results ale reversed, but if you need the in the ASC order you can order them with a parent query or the backend code side.
Bonus tip
If you don't need to limit the results, this example doesn't work well, because the DB engine does not know that created_at follows the same order of the id column. The best thing you can do in this case is to split the extraction in 2 queries:
The subquery extracts the last record id that doesn't meet the desired time range.
The main query directly uses the primary key to skip all the not needed records in a very efficient way.