Franck Pachot for YugabyteDB Distributed PostgreSQL Database

Posted on Jan 15 • Originally published at linkedin.com

Covering Index nuances: which columns to cover (WHERE, ORDER BY, LIMIT, SELECT)?

#yugabytedb #postgres #sql #distributed

I explained how covering indexes can reduce the required reads when scanning multiple rows in an SQL database. This is particularly useful in Distributed SQL databases, such as YugabyteDB, where additional reads may result in network latency.

If you are looking for the best index for a single query, it's easy to find. Each table access should be an index-only scan with no additional sorting operation, reading only the required rows. However, creating an index for each query can negatively impact others. To optimize index creation, you may cover only some columns of the query rather than all. This approach is sufficient for reaching your performance goals and limiting side effects. Also, you'll be able to cover these columns in the index key or an additional include clause.

Let's consider some examples in PostgreSQL. I run them on YugabyteDB, which displays additional statistics from distributed reads with explain (analyze, dist, debug). In PostgreSQL, you can run the same with explain (analyze, buffers) to see the gains in logical reads.

I create the following table, filling it with one million orders


drop table if exists orders;

create extension if not exists pgcrypto;

create table orders (
 primary key(order_id)
 ,order_id uuid default gen_random_uuid()
 ,order_date timestamptz default clock_timestamp()
 ,customer_id text
 ,description text
 ,quantity int
);

with fruits(desc1) as ( values ('bananas'),('orange'),('cherry') ),
     colors (desc2) as ( values ('blue'),('red'),('green'),('yellow'),('gray'),('pink'),('brown'),('black') )
insert into orders(customer_id,description,quantity)
select customer, desc1||' '||desc2||'s',100*random() from fruits, colors, generate_series(1,1000) customer, generate_series(1,42)
;

I'll run a multi-criteria query to read orders from one customer (out of 1000), quantity higher than 20 (which is 80% of them), description including 'banana' (one-third of them), and the result ordered by quantity and order date:

prepare demo as
select * from orders
where customer_id = $1 
  and quantity>$2 
  and description like $3
order by quantity, order_date
;

Full Table Scan - reading all columns without index

The primary key starts by order_id and cannot be used to reduce the rows to scan:

explain (costs off, analyze, dist, debug, summary off) 
         execute demo(42,20,'%banana%')
;

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Sort  (actual time=254.185..254.196 rows=261 loops=1)
   Sort Key: quantity, order_date
   Sort Method: quicksort  Memory: 45kB
   ->  Seq Scan on orders  (actual time=253.959..254.020 rows=261 loops=1)
         Remote Filter: ((quantity > 20) AND (description ~~ '%banana%'::text) AND (customer_id = '42'::text))
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 253.830 ms
         Metric rocksdb_number_db_seek: 3.000
         Metric rocksdb_number_db_next: 1008000.000
         Metric rocksdb_number_db_seek_found: 3.000
         Metric rocksdb_number_db_next_found: 1007997.000
         Metric rocksdb_iter_bytes_read: 109299541.000
         Metric docdb_keys_found: 1008000.000
         Metric ql_read_latency: sum: 756214.000, count: 3.000

The execution did only one Table Read Requests to 3 tablets (I'm running on a 3 nodes cluster where table is split by default into 3 shards), which is visible as 3 rocksdb_number_db_seek into the LSM-Tree. The time complexity comes from reading all rows from there, one million, visible as rocksdb_number_db_next. This is not scalable because, even if I'm querying for one customer only, I have to read the orders for all of them.

Note: In YugabyteDB the table is stored in its primary key, which is an index covering all columns. But we don't call that a covering index for this query as it is not an index access relying on the sorted structure.

To improve this, I want to avoid reading orders that do not belong to my customer so that the response time matches the scope of the query.

Index Scan covering the most selective column in WHERE clause

With an index on customer_id I can reduce the scan by a factor of 1000 because I read for one customer out of one thousand:

yugabyte=# create index orders1 on orders (customer_id);
CREATE INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (actual time=5.240..5.251 rows=261 loops=1)
   Sort Key: quantity, order_date
   Sort Method: quicksort  Memory: 45kB
   ->  Index Scan using orders1 on orders  (actual time=4.988..5.104 rows=261 loops=1)
         Index Cond: (customer_id = '42'::text)
         Remote Filter: ((quantity > 20) AND (description ~~ '%banana%'::text))
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 2.837 ms
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 0.776 ms
         Metric rocksdb_number_db_seek: 1007.000
         Metric rocksdb_number_db_next: 4022.000
         Metric rocksdb_number_db_seek_found: 1007.000
         Metric rocksdb_number_db_next_found: 4022.000
         Metric rocksdb_iter_bytes_read: 488862.000
         Metric docdb_keys_found: 2016.000
         Metric ql_read_latency: sum: 4873.000, count: 4.000

This is still one Table Read Requests but, before it, there's one Index Read Request to identify which orders belong to this customer.
In this index, you can expect one rocksdb_number_db_seek to get to the start of the range of index entries, and 1000 rocksdb_number_db_next to read the thousand of entries with their order_id. Then you have to read those thousand of rows which are scattered in the table, with a thousand of rocksdb_number_db_seek. The seek operation on an LSM-Tree uses CPU to compare the keys and have lower chancest than the next operation to be a cache hit, so this is something to reduce.

Reading only a thousand of rows out of one million is good, thanks to Index Cond: (customer_id = '42'::text) but my result is rows=261 so there are still 1/5th of rows that are read to be eliminated by Remote Filter: ((quantity > 20) AND (description ~~ '%banana%'::text)).

Index Scan covering additional range filtering column in WHERE clause

LIKE '%banana%' cannot help to reduce the range because we cannot hash or sort on a pattern where we don't know the prefix.
However, quantity > 20 can be a single range of orders within the orders of one customer.
I add this column to the index key:

yugabyte=# create index orders2 on orders (customer_id, quantity asc);
CREATE INDEX

yugabyte=# drop index orders1;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (actual time=5.830..5.841 rows=261 loops=1)
   Sort Key: quantity, order_date
   Sort Method: quicksort  Memory: 45kB
   ->  Index Scan using orders2 on orders  (actual time=5.550..5.685 rows=261 loops=1)
         Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
         Remote Filter: (description ~~ '%banana%'::text)
         Storage Table Read Requests: 1
         Storage Table Read Execution Time: 3.473 ms
         Storage Index Read Requests: 1
         Storage Index Read Execution Time: 0.868 ms
         Metric rocksdb_number_db_seek: 792.000
         Metric rocksdb_number_db_next: 2762.000
         Metric rocksdb_number_db_seek_found: 792.000
         Metric rocksdb_number_db_next_found: 2762.000
         Metric rocksdb_iter_bytes_read: 344977.000
         Metric docdb_keys_found: 1582.000
         Metric ql_read_latency: sum: 3851.000, count: 4.000

What has changed here is that (quantity > 20) was promoted from a Remote Filter, applied after reading, to an Index Cond applied when seeking into the index entries.
The seek to the index goes to the first entry for this customer that has the quantity higher to the value specified and reads from them. It means less index entries and rows to read (rocksdb_number_db_next) but also less rocksdb_number_db_seek into the table because 1/5th of rows that had to be eliminated by those condition were already eliminated from the index entries.

My index is covering all the columns for which it can reduce the range. There are additional columns used in my query, such as description in WHERE clause, order_date in ORDER BY clause and order_id in SELECT clause (as * has to get all columns). Let's see how advantageous it can be to cover them as well.

Index Scan covering the ORDER BY to avoid sorting

Without reducing the number of reads I can improve the previous execution plan by avoiding the Sort operation. This is a minimal gain here as only rows=261 have to be sorted but with more rows, and especially with a LIMIT or FETCH FIRST ROWS, the sort operation may have to read many rows that are discarded later.

I add order_date to the key so that the index matches my ORDER BY clause. You can see exactly what to add by looking at the previous execution plan: Sort Key: quantity, order_date

yugabyte=# create index orders3 
           on orders (customer_id, quantity asc, order_date asc);
CREATE INDEX

yugabyte=# drop index orders2;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders3 on orders (actual time=4.654..4.783 rows=261 loops=1)
   Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
   Remote Filter: (description ~~ '%banana%'::text)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 2.434 ms
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 0.938 ms
   Metric rocksdb_number_db_seek: 792.000
   Metric rocksdb_number_db_next: 2386.000
   Metric rocksdb_number_db_seek_found: 792.000
   Metric rocksdb_number_db_next_found: 2386.000
   Metric rocksdb_iter_bytes_read: 311358.000
   Metric docdb_keys_found: 1582.000
   Metric ql_read_latency: sum: 3887.000, count: 4.000

Here, in addition to covering the range predicates from the WHERE clause, my index covers the ORDER BY to get the rows in the expected order directly from the Index Scan.

Note that getting the rows already sorted also benefits to the GROUP BY clause as aggregation is a low effort on a sorted set.

I am still read too many rows from the table, resulting in too many rocksdb_number_db_seek because I have to get the row from the table in order to apply the Remote Filter: (description ~~ '%banana%'::text) to it. We can do better.

Index Scan covering all columns in WHERE clause

Even if it will not be used to reduce the index range to scan, I can add description to my index. In PostgreSQL or YugabyteDB, you can put it in an INCLUDE clause when it doesn't need to belong to the key.
With the column value in the index entry, the filter can be applied before reading the rows from the table:

yugabyte=# create index orders4 
           on orders (customer_id, quantity,order_date)
           include (description);
CREATE INDEX

yugabyte=# drop index orders3;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders4 on orders  (actual time=3.464..3.577 rows=261 loops=1)
   Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
   Remote Index Filter: (description ~~ '%banana%'::text)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 1.214 ms
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 1.140 ms
   Metric rocksdb_number_db_seek: 262.000
   Metric rocksdb_number_db_next: 1304.000
   Metric rocksdb_number_db_seek_found: 262.000
   Metric rocksdb_number_db_next_found: 1304.000
   Metric rocksdb_iter_bytes_read: 155580.000
   Metric docdb_keys_found: 1052.000
   Metric ql_read_latency: sum: 2019.000, count: 4.000
(14 rows)

The benefit is not immediately visible from the execution plan because the predicate is still visible as Remote Filter in Index Scan but the execution metrics show that we did less rocksdb_number_db_seek. The filtering happened during the index scan and then have saved seeks to the table.

Here all columns used to filter are covered by the index to reduce the access to the table to its minimum. Before adding more column, you may have heard that covering indexes in PostgreSQL use the INCLUDE clause but not always.

Cover with the INCLUDE clause or the key suffix?

The INCLUDE clause exists in PostgreSQL, and YugabyteDB, but not in Oracle where you simply add the column at the end of the key. Covering the where clauses doesn't require this additional feature. I can do the same in PostgreSQL or YugabyteDB by adding the column at the end of the key rather than in INCLUDE:

yugabyte=# create index orders5 
           on orders (customer_id, quantity,order_date
                      , description
           );
CREATE INDEX

yugabyte=# drop index orders4;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Index Scan using orders5 on orders  (actual time=3.500..3.614 rows=261 loops=1)
   Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
   Remote Index Filter: (description ~~ '%banana%'::text)
   Storage Table Read Requests: 1
   Storage Table Read Execution Time: 1.279 ms
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 0.974 ms
   Metric rocksdb_number_db_seek: 262.000
   Metric rocksdb_number_db_next: 1304.000
   Metric rocksdb_number_db_seek_found: 262.000
   Metric rocksdb_number_db_next_found: 1304.000
   Metric rocksdb_iter_bytes_read: 153996.000
   Metric docdb_keys_found: 1052.000
   Metric ql_read_latency: sum: 2057.000, count: 4.000

Nothing has changed in the execution plan and the number of reads is the same.
In general, I prefer to add the columns to cover at the end of the key rather than in an INCLUDE clause, especially in YugabyteDB where the key is packed and additional columns may be additional entries (but it is not the case here or we would have seen additional rocksdb_number_db_next).

There are two major reasons to move to INCLUDE the columns that are not used for range access:

if the index is unique, you don't have the choice as the unicity is defined on the whole key.
if the columns are frequently updated, having them in the key would be like a delete and insert of the index entry which is more expensive

When the key is not defining a unique constraint, and the columns are not updated, I add them at the end of the key. In addition to better storage that can also be used by other queries. For example, another query may use description LIKE 'prefix%' which can reduce the range. The ideal index is the one that can serve multiple queries.

You may have been surprised that I call it "covering" without an INCLUDE clause and I just explained the reason. INCLUDE is not needed to cover columns. You may also be surprised that I call it "covering" without seeing an Index Only Scan. That's because it is only the ultimate level of covering where you don't have to read the table rows at all. The goal of this blog post is to show that there are different level of covering indexes.

Index Scan covering all columns in SELECT

To get an Index Only Scan, all columns must be in the index, in a readable way.

I used a SELECT * and then all columns must be covered, but the easiest is to look at the Output: note of explain (verbose):

yugabyte=# explain (costs off, analyze, summary off, verbose)
         execute demo(42,20,'%banana%')
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Index Scan using orders5 on public.orders (actual time=2.484..2.593 rows=261 loops=1)
   Output: order_id, order_date, customer_id, description, quantity
   Index Cond: ((orders.customer_id = '42'::text) AND (orders.quantity > 20))
   Remote Index Filter: (orders.description ~~ '%banana%'::text)

To cover all the query columns, I must add order_id. I can add it in INCLUDE or in the key. As it is the primary key of the table, I know that I'll never update it and add it to the key:

yugabyte=# create index orders6 
           on orders (customer_id, quantity,order_date
                      , description, order_id
           );
CREATE INDEX

yugabyte=# drop index orders5;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Only Scan using orders6 on orders (actual time=2.275..2.422 rows=261 loops=1)
   Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
   Remote Filter: (description ~~ '%banana%'::text)
   Heap Fetches: 0
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 1.155 ms
   Metric rocksdb_number_db_seek: 1.000
   Metric rocksdb_number_db_next: 791.000
   Metric rocksdb_number_db_seek_found: 1.000
   Metric rocksdb_number_db_next_found: 791.000
   Metric rocksdb_iter_bytes_read: 87353.000
   Metric docdb_keys_found: 791.000
   Metric ql_read_latency: sum: 724.000, count: 1.000

This index fully covers my query. There's no need to read the table. The read requests are at their minimum: only one Index Read Requests and only one rocksdb_number_db_seek.

You may think that the primary key is already in the index entry as it is used to find the table row to get more columns. That's true, but in YugabyteDB indexes, it is encoded and not directly usable to get the value.

Partial index to go further on row selection

When you want the perfect index, covering all columns, you have fat index, specific to a particular query. To counter-balance the overhead, you may reduce their size, and their impact on other DML, by reducing their index entries to the minimum. For example, if my query is always filtering on LIKE '%banana%', I don't need to index the other rows:

yugabyte=# create index orders7 
           on orders (customer_id, quantity,order_date
                      , description, order_id
           ) where description like '%banana%' ;
CREATE INDEX

yugabyte=# drop index orders6;
DROP INDEX

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
         execute demo(42,20,'%banana%')
;

                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Only Scan using orders7 on orders (actual time=1.817..1.961 rows=261 loops=1)
   Index Cond: ((customer_id = '42'::text) AND (quantity > 20))
   Heap Fetches: 0
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 0.691 ms
   Metric rocksdb_number_db_seek: 1.000
   Metric rocksdb_number_db_next: 261.000
   Metric rocksdb_number_db_seek_found: 1.000
   Metric rocksdb_number_db_next_found: 261.000
   Metric rocksdb_iter_bytes_read: 29000.000
   Metric docdb_keys_found: 261.000
   Metric ql_read_latency: sum: 256.000, count: 1.000

Here, I have a specific index that reaches the perfection for this query: only one read request and one seek per query execution, and it reads only the strict minimum: rocksdb_number_db_next_found: 261.000 to get rows=261.

In summary, covering a query with an index involves many nuances. Adding all columns in the INCLUDE to see Index Only Scan everywhere is not productive. It may increase the number of indexes to maintain with no significant benefit. When there are too many indexes, it can make the query planner's job harder and lead to suboptimal plans, even if maintaining the indexes is not an issue.

The most important to cover is the most selective filtering in your WHERE clause, and possibly the ORDER BY columns that can help avoid sorting all rows before returning the result or filtering further. It is possible to add more columns to cover all filters and projections, but it may be reserved to the most critical use cases.

To improve the performance of database queries, it is more effective to create a single index that can meet the requirements of multiple queries, rather than creating a specific index for each query. In order to achieve this, it is important to consider both the columns to be covered and the rows to be indexed, by using partial indexing.

For example, the last index I've created, and where the first column I wanted to cover was the customer_id, for a query on a single customer, is still efficient in YugabyteDB for a different query on multiple customers:

yugabyte=# explain (costs off, analyze, dist, debug, summary off)
select * from orders
where customer_id in ('42','24','22','44','142','124','122','144')
  and quantity=1 and description like '%banana%'
;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Index Only Scan using orders7 on orders (actual time=0.870..0.889 rows=31 loops=1)
   Index Cond: ((customer_id = ANY ('{42,24,22,44,142,124,122,144}'::text[])) AND (quantity = 1))
   Heap Fetches: 0
   Storage Index Read Requests: 1
   Storage Index Read Execution Time: 0.736 ms
   Metric rocksdb_number_db_seek: 8.000
   Metric rocksdb_number_db_next: 41.000
   Metric rocksdb_number_db_seek_found: 8.000
   Metric rocksdb_number_db_next_found: 41.000
   Metric rocksdb_iter_bytes_read: 5393.000
   Metric docdb_keys_found: 31.000
   Metric ql_read_latency: sum: 279.000, count: 3.000

I have 8 rocksdb_number_db_seek, to find the 8 customers, and reading 41 index entries with rocksdb_number_db_next to get the 31 rows of the result.

DEV Community

Covering Index nuances: which columns to cover (WHERE, ORDER BY, LIMIT, SELECT)?

Full Table Scan - reading all columns without index

Index Scan covering the most selective column in WHERE clause

Index Scan covering additional range filtering column in WHERE clause

Index Scan covering the ORDER BY to avoid sorting

Index Scan covering all columns in WHERE clause

Cover with the INCLUDE clause or the key suffix?

Index Scan covering all columns in SELECT

Partial index to go further on row selection

Top comments (0)

Read next

Cloudflare + Remix + PostgreSQL with Prisma Accelerate's Self Hosting

Employee Management System Using SQL

Hierarchical data with PostgreSQL and Spring Data JPA

MySQL Basics: Understanding Isolation Level and Transaction Anomalies