Abdelrahman Mohamed Allam

Posted on May 19, 2023 • Updated on Feb 16

7 Bad Practices to Avoid When Writing SQL Queries for Better Performance 🚀

#sql #performance #database

When working with databases, optimizing SQL queries is essential to achieve better performance and reduce response times. In this post, we'll cover some common bad practices in SQL queries and provide examples of how to write queries that perform better

🙅 Bad Practice 1: Using SELECT *

One of the most common mistakes when writing SQL queries is using the SELECT * statement to fetch all columns from a table. This can cause performance issues, especially when working with large tables, as it requires the database to read all columns, even if they're not needed.

👌 Good Practice: Selecting Specific Columns

Instead of using SELECT *, it's better to select only the columns that are needed in the query. This reduces the amount of data that needs to be read, improving performance. For example:


-- 🔴 Bad Practice 

SELECT * FROM users WHERE id = 1;

-- 🟢 Good Practice 

SELECT name, email FROM users WHERE id = 1;

🙅 Bad Practice 2: Using Subqueries in SELECT

Another common mistake is using subqueries in the SELECT statement. Subqueries are queries inside other queries, and they can be expensive to execute, especially when the outer query returns many results.

👌 Good Practice: Using JOINs Instead of Subqueries

Instead of using subqueries, it's better touse JOINs to combine data from multiple tables. This can be more efficient, as JOINs can be optimized by the database engine. For example:


-- 🔴 Bad Practice 

SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count

FROM users;

-- 🟢 Good Practice

SELECT users.name, COUNT(orders.*) AS order_count

FROM users

JOIN orders ON users.id = orders.user_id

GROUP BY users.id;

In this example, we're calculating the number of orders for each user. The bad practice query uses a subquery in the SELECT statement, while the good practice query uses a JOIN and a GROUP BY clause to achieve the same result.

🙅 Bad Practice 3: Using LIKE with Leading Wildcards

Using the LIKE operator with leading wildcards (%) can cause performance issues, as it requires a full table scan to find matching rows.

👌 Good Practice: Using Indexes and Trailing Wildcards

To improve performance when searching for patterns in text fields, it's better to use indexes and trailing wildcards (%) instead of leading ones. This allows the database to use the index to find matching rows, rather than performing a full table scan. For example:


-- 🔴 Bad Practice 

SELECT * FROM products WHERE name LIKE '%apple%';

-- 🟢 Good Practice

SELECT * FROM products WHERE name LIKE 'apple%';

In this example, the bad practice query uses a leading wildcard, which cancause a full table scan. The good practice query uses a trailing wildcard, which allows the database to use an index to find matching rows more efficiently.

🙅 Bad Practice 4: Using ORDER BY with Large Result Sets

Using the ORDER BY clause with large result sets can be inefficient, as it requires the database to sort all rows returned by the query.

👌 Good Practice: Using LIMIT with ORDER BY https://dev.mysql.com/doc/refman/8.3/en/limit-optimization.html

To improve performance when sorting large result sets, it's better to use the LIMIT clause with ORDER BY so that the database only sorts the necessary rows. For example:


-- 🔴 Bad Practice 

SELECT * FROM products ORDER BY price;

-- 🟢 Good Practice

SELECT * FROM products ORDER BY price LIMIT 100;

In this example, the bad practice query orders all products by price, while the good practice query orders the first 100 products by price.

🙅 Bad Practice 5: Using COUNT(*) with Large Tables

Using the COUNT(*) function with large tables can be inefficient, as it requires the database to count all rows in the table.

👌 Good Practice: Using COUNT(1) or COUNT(column)

To improve performance when counting rows, it's better to use COUNT(1) or COUNT(column) instead of COUNT(*), as they can be optimized by the database engine. For example:


-- 🔴 Bad Practice

SELECT COUNT(*) FROM users;

-- 🟢 Good Practice

SELECT COUNT(1) FROM users;

In this example, thebad practice query counts all rows in the users table, while the good practice query uses COUNT(1) to achieve the same result more efficiently.

🙅 Bad Practice 6: Using DISTINCT with Large Result Sets

Using the DISTINCT keyword with large result sets can be inefficient, as it requires the database to sort and remove duplicates from all rows returned by the query.

👌 Good Practice: Using GROUP BY Instead of DISTINCT

https://dev.mysql.com/doc/refman/8.3/en/distinct-optimization.html
To improve performance when removing duplicates, it's better to use the GROUP BY clause instead of DISTINCT, as it can be optimized by the database engine. For example:


-- 🔴 Bad Practice 

SELECT DISTINCT category FROM products;

-- 🟢 Good Practice 👌

SELECT category FROM products GROUP BY category;

In this example, the bad practice query removes duplicates from all categories in the products table, while the good practice query groups the rows by category to achieve the same result more efficiently.

🙅 Bad Practice 7: Using OR in WHERE Clauses

Using the OR operator in WHERE clauses can be inefficient, as it can force the database to perform a full table scan to find matching rows.

👌 Good Practice: Using UNION or IN Instead of OR

To improve performance when filtering rows, it's better to use UNION or IN instead of OR, as they can be optimized by the database engine. For example:


-- 🔴 Bad Practice 

SELECT * FROM products WHERE category = 'fruit' OR category ='dairy';

-- 🟢 Good Practice 👌

SELECT * FROM products WHERE category IN ('fruit', 'dairy');

In this example, the bad practice query uses OR to find rows that match either 'fruit' or 'dairy', while the good practice query uses IN to achieve the same result more efficiently.

Conclusion:

By selecting specific columns in SELECT statements, using JOINs instead of subqueries, using indexes with trailing wildcards, using LIMIT with ORDER BY, using COUNT(1) or COUNT(column) instead of COUNT(*), using GROUP BY instead of DISTINCT, and using UNION or IN instead of OR, you can improve query performance and make your applications more responsive.

Remember, There is more then 7 practices that could improve writing SQL query much better, like Using Indexes, these are just some examples, and there are many other ways to optimize SQL queries. Always analyze your queries and use database-specific features and tools to improve performance.

Check also Git Good Commands

Top comments (2)

Mike Talbot ⭐ • May 19 '23

ORDER BY and LIMIT still sorts all the records, otherwise you'd never be able to do paged sorted lists.

Not sure how it can be "bad practice" to use a %searchitem% if that's how the data is! Presumably a good idea to use other ANDed statements to try to reduce the list so that the final scanned set is as limited as possible - presuming such limitations are viable in the scenario.

I'm confused about your DISTINCT example, definitely not true for MySQL as MySQL can optimize DISTINCT queries by using an index to quickly find and remove duplicate rows. This can be a significant performance improvement for queries that return a large number of rows.

MySQL cannot optimize GROUP BY queries in the same way, because the GROUP BY clause requires MySQL to calculate aggregate functions for each group of rows. This can be a significant performance penalty for queries that return a large number of rows.

In general, you should use the DISTINCT clause whenever you only need to return unique rows from a table. You should use the GROUP BY clause whenever you need to calculate aggregate functions for groups of rows.

Abdelrahman Mohamed Allam • May 19 '23

Simply the database engine gets the first 10 rows ASC/DESC, so no need to sort 100,000 rows, when the SQL engine will find 10 will stop
limit-optimization
MySQL is really smart and can figure out when you only want a few rows from a big table. When you use LIMIT, sometimes MySQL can use a special trick to find those rows really fast, If you use LIMIT with an ORDER BY clause, MySQL will stop sorting the rows as soon as it finds enough rows to show you. This is really fast if the table is already sorted in a certain way

DISTINCT and GROUP BY - Sometime the scenario change, but the behavior and implementation is different - DISTINCT uses temp table, while GROUP BY uses Hash
distinct-optimization
group by and distinct

@miketalbot

DEV Community