DEV Community

Antônio Inocêncio
Antônio Inocêncio

Posted on

Partial index

A partial index is a valuable feature supported by various relational databases but is not commonly used. In this post, we will explore some use cases where you can benefit from using partial indexes.

Partial indexes can be useful in various scenarios where you want to improve query performance by reducing the size of an index or filtering which rows are included in the index. Here are some examples of when to use partial indexes:

  • Selective Indexing: When you have a large table with many rows but want to index only a subset of those rows that meet specific criteria. For example, you may have a table of customer orders, and you want to create an index only for orders that are not canceled (status != ‘canceled’).

  • Data Range Filtering: When you have a table with a date or timestamp column and want to create an index for a specific time range. This can be helpful for time-based data, like logs or sensor readings.

  • Boolean or Enum Columns: When you have boolean or enum columns and want to create an index on specific values. For instance, you might have a table of products and want to index only the active products.

  • Sparse Data: When you have a column with a low cardinality and want to create an index for a specific value that is not present in most rows.

  • Query Optimization: When you have queries that frequently filter by a specific condition, a partial index can significantly speed up those queries. For example, if you often search for unshipped orders.

  • Storage Optimization: Partial indexes can reduce the storage requirements of your database because they are smaller due to indexing a subset of rows. This can be especially valuable in scenarios with limited storage capacity.

  • Concurrency: In databases with high levels of concurrent write operations, partial indexes can reduce contention and locking conflicts by only indexing a subset of rows.

Let’s consider a scenario where you have a table for user accounts, and it includes a boolean column called active to track whether a user account is active (true) or inactive (false). In this case, most of the user accounts are inactive. We want to compare the use of a partial index and a non-partial index for this scenario.

Non-Partial Index Example:

Suppose you have a user_accounts table with 8,4 million rows and the following schema:

CREATE TABLE tb_user_accounts (
    user_id serial PRIMARY KEY,
    username varchar(255) NOT NULL,
    active boolean NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

To create a non-partial index on the active column, you would typically create an index that covers all rows, including both active and inactive users:

CREATE INDEX idx_all_users_active ON tb_user_accounts (active);
Enter fullscreen mode Exit fullscreen mode

This index will include all user accounts, regardless of their active status.

Partial Index Example:

Now, let’s consider the scenario where you want to optimize queries that involve active user accounts (where active = true). To do this, you can create a partial index as follows:

CREATE INDEX idx_active_users_active ON tb_user_accounts (user_id) 
WHERE active = true;
Enter fullscreen mode Exit fullscreen mode

In this case, the WHERE clause specifies that only rows where active is true will be included in the index. This results in a smaller, more focused index that includes only active user accounts, leading to better query performance.

select pg_size_pretty(pg_table_size('idx_all_users_active'));

pg_size_pretty|
--------------+
56 MB         |

select pg_size_pretty(pg_table_size('idx_active_users_active'));

pg_size_pretty|
--------------+
240 kB        |
Enter fullscreen mode Exit fullscreen mode

The above syntax is compatible with PostgreSQL. In Oracle Database and MySQL 8, you can create an function based index using a CASE expression to effectively achieve a partial index-like behavior. Here's an example:

CREATE INDEX idx_active_users_active ON user_accounts 
(CASE WHEN active = 1 THEN user_id ELSE NULL END);
Enter fullscreen mode Exit fullscreen mode

In summary, a partial index is advantageous when you have specific criteria for indexing a subset of rows within a table, such as optimizing queries for active user accounts in a scenario where most rows have the value active = false. It can lead to improved query performance, reduced storage requirements, and better concurrency handling compared to non-partial indexes, which index all rows indiscriminately.

Written by Matheus Mendonça, DBA & co-founder at Nazar

Top comments (0)