DEV Community

Zahir Din
Zahir Din

Posted on • Updated on

SQL Count using Where Clause or Case Statement

As a developer, we want our application to run smooth and have a great performance. One of the performance we want to improve is query performance.

We do count rows in SQL command many times using COUNT.

But is it really the best way to do it with 1 million rows of data?

Well, it depends on situation.

For example, this is table we want to count.

CREATE TABLE users (
  id int, 
  first_name varchar(255), 
  last_name varchar(255), 
  age varchar(255), 
  gender varchar(255),
  address varchar(255)
);

CREATE INDEX person_name ON users (first_name, last_name);
Enter fullscreen mode Exit fullscreen mode

There is 2 ways to do count statement if developer didn't have permission to altered the table.

  • Using WHERE Clause
  • Using CASE Statement

Most of use simply use WHERE condition with COUNT function. For example like below:

SELECT COUNT(*) AS count 
FROM users 
WHERE age < 20;
Enter fullscreen mode Exit fullscreen mode

But in some cases, it is better to use CASE statement when:

  1. The conditions being evaluated are complex and cannot be easily indexed.

    • If the conditions in the WHERE clause are too complex to be indexed, the database may have to evaluate each row individually, which can be slow.
    • In contrast, a CASE statement can evaluate complex conditions within the COUNT function, potentially reducing the number of rows that need to be processed.
  2. When the conditions are based on multiple columns.

    • If the conditions in the WHERE clause span multiple columns and those columns are not indexed together, the database might not be able to use an index effectively.
    • In such cases, a CASE statement can be used to evaluate conditions on multiple columns within the COUNT function, reducing the number of rows that need to be processed.
  3. When the data is not well indexed.

    • If the table is not well indexed or the data is not well distributed, a WHERE clause might not be able to take advantage of indexes and have to scan the entire table.
    • In such scenarios, a CASE statement might be faster.
  4. When the data is highly selective.

    • If the data is highly selective, where only a small percentage of the data needs to be counted, then using a CASE statement might be faster than using a WHERE clause because it eliminates the need for the database to scan the entire table.

Example using CASE statement:

SELECT 
  COUNT(CASE
     WHEN age < 20
     THEN 1
     ELSE NULL
  END) AS count
FROM 
  users;
Enter fullscreen mode Exit fullscreen mode

It's worth noting that, in both cases, if the table is large and 'age' column is not indexed, both queries will have to scan the entire table to find the rows that match the condition, which can be slow.

However, if the table is small, or the percentage of rows that match the condition is very low, the performance difference between the two queries might not be significant. So it's always recommended to test both options and check the performance for your specific use case and dataset.

This is the same situation when using GROUP BY statement.

It's worth noting that the performance difference between using a GROUP BY clause with a WHERE or CASE statement will depend on the complexity of the condition, the size of the table, and the distribution of the data.

This is based on the situation when the developer didn't have the access to database to altered the table schema.

If the developer have an access, it's recommended to consider other options like indexing the column or partitioning the table which will improve the performance of the query.

Top comments (0)