As a developer, we want our application to run smooth and have a great performance. One of the performance we want to improve is query performance.
We do count rows in SQL command many times using COUNT
.
But is it really the best way to do it with 1 million rows of data?
Well, it depends on situation.
For example, this is table we want to count.
CREATE TABLE users (
id int,
first_name varchar(255),
last_name varchar(255),
age varchar(255),
gender varchar(255),
address varchar(255)
);
CREATE INDEX person_name ON users (first_name, last_name);
There is 2 ways to do count statement if developer didn't have permission to altered the table.
- Using
WHERE
Clause - Using
CASE
Statement
Most of use simply use WHERE
condition with COUNT
function. For example like below:
SELECT COUNT(*) AS count
FROM users
WHERE age < 20;
But in some cases, it is better to use CASE
statement when:
-
The conditions being evaluated are complex and cannot be easily indexed.
- If the conditions in the
WHERE
clause are too complex to be indexed, the database may have to evaluate each row individually, which can be slow. - In contrast, a
CASE
statement can evaluate complex conditions within theCOUNT
function, potentially reducing the number of rows that need to be processed.
- If the conditions in the
-
When the conditions are based on multiple columns.
- If the conditions in the
WHERE
clause span multiple columns and those columns are not indexed together, the database might not be able to use an index effectively. - In such cases, a
CASE
statement can be used to evaluate conditions on multiple columns within theCOUNT
function, reducing the number of rows that need to be processed.
- If the conditions in the
-
When the data is not well indexed.
- If the table is not well indexed or the data is not well distributed, a
WHERE
clause might not be able to take advantage of indexes and have to scan the entire table. - In such scenarios, a
CASE
statement might be faster.
- If the table is not well indexed or the data is not well distributed, a
-
When the data is highly selective.
- If the data is highly selective, where only a small percentage of the data needs to be counted, then using a
CASE
statement might be faster than using aWHERE
clause because it eliminates the need for the database to scan the entire table.
- If the data is highly selective, where only a small percentage of the data needs to be counted, then using a
Example using CASE
statement:
SELECT
COUNT(CASE
WHEN age < 20
THEN 1
ELSE NULL
END) AS count
FROM
users;
It's worth noting that, in both cases, if the table is large and 'age' column is not indexed, both queries will have to scan the entire table to find the rows that match the condition, which can be slow.
However, if the table is small, or the percentage of rows that match the condition is very low, the performance difference between the two queries might not be significant. So it's always recommended to test both options and check the performance for your specific use case and dataset.
This is the same situation when using GROUP BY
statement.
It's worth noting that the performance difference between using a GROUP BY
clause with a WHERE
or CASE
statement will depend on the complexity of the condition, the size of the table, and the distribution of the data.
This is based on the situation when the developer didn't have the access to database to altered the table schema.
If the developer have an access, it's recommended to consider other options like indexing the column or partitioning the table which will improve the performance of the query.
Top comments (0)