DEV Community

Franck Pachot for YugabyteDB

Posted on

The cost of additional secondary indexes in PostgreSQL & YugabyteDB

As more query use cases are coming to your application, you add more indexes on the tables. That's the agility of SQL databases: you can add new access patterns and optimize for them. But indexes have an overhead on inserts, deletes and updates, so you need to keep their number low. To the question How Many Indexes Are Too Many?, Brent Ozar answers: 5 indexes per table, with around 5 columns (or less) on each.

I don't like magic numbers. I prefer to understand what exactly is this overhead and test it because it can depend on your database and its version. I think some databases do not have the same overhead with unique and non-unique secondary indexes. This is just a simple test, to give an idea, and show how it can be tested.

10000 into an empty table

I'll start with a very small dataset: inserting 10000 rows into an empty table an compare with 0 to 10 secondary indexes.

I create the following table with i indexes (which I run in a for i in {0..10} loop):

create table demo (n bigint);
create index nonconcurrently on demo(n)
\watch count=$i
Enter fullscreen mode Exit fullscreen mode

I use nonconcurrently to get it faster in YugabyteDB (the default is online creation with backfill but that's not needed when there are no concurrent sessions)

I'll run this with non unique indexes, but also with unique indexes. The \watch command with count= is a PostgreSQL 16 feature to repeat the command a specified number of times.

For each, I run the following DML:

insert into demo select generate_series(1,10000);
update demo set n=-n;
delete from demo;
vacuum demo;
Enter fullscreen mode Exit fullscreen mode

I gather the execution time with pg_stat_statements:

insert into stats
select split_part(query,' ',1) command, count as indexes, calls, rows, mean_time milliseconds
from pg_stat_statements , (
select count(*) from pg_indexes where tablename='demo'
) indexes where query like '% demo%' order by 1
;
Enter fullscreen mode Exit fullscreen mode

For recent versions of PostgreSQL you may change mean_time to mean_exec_time.

I create the stats table to store stats from pg_stat_statementsand reset the stats before the loop:

create table if not exists stats (dml text, indexes int, calls int, rows int, milliseconds float, primary key(dml, indexes) );
select pg_stat_statements_reset();
Enter fullscreen mode Exit fullscreen mode

I display the result with:

select *,
 format('%4s %%',round(100*milliseconds/min(milliseconds)over(partition by dml order by indexes))) "pct"
 from stats order by dml, indexes;
Enter fullscreen mode Exit fullscreen mode

I'll put in this blog post the graph built from this, showing the execution time, in millisecond, on the Y-axis, with the number of indexes on the X-axis.

I run this for PostgreSQL and YugabyteDB, and with non-unique and unique indexes, and the following paragraphs shows those 4 combinations.

PostgreSQL non-unique indexes

With such a small dataset, all rows fit into the shared buffers, which is where B-Tree indexes are efficient. The most expensive in PostgreSQL is the update as it copies the whole row, adds new index entries, and mark the old version. The insert does nearly the same, except that there's no old version. A delete in PostgreSQL is just marking the old version.

However, all those have an hidden cost: vacuum has some cleanup to do later.

Image description

PostgreSQL unique indexes

Unique indexes in PostgreSQL are very similar, with all operations apparently a little more expensive, for which I have no explanation (comments welcome):
Image description

YugabyteDB non-unique indexes

YugabyteDB distributes data and then each operation takes a little longer. The performance benefit is that it is scalable and the throughput can increase without increasing this latency. What matters here is the difference between those operations.

First, the delete takes longer because it inserts a tombstone for the deleted key. However, there is no vacuum needed with YugabyteDB which is a real advantage. All DML leaves the rows in an optimal state. The MVCC garbage collection is done at a lower level, in the distributed storage, with SST File compaction.

The insert is the fastest operation:
Image description

I like this as it matches the most common OLTP requirements: fast insert throughput, no unpredictable vacuum cost, deletes may happen, and updates more rarely.

YugabyteDB unique indexes

It follows the same pattern with unique indexes:
Image description

You may wonder why I tested unique and non-unique indexes. Of course I had something in mind that I wanted to verify. However, on this small dataset where all stays in memory the difference is not visible. I'll test later on a larger table but first, let's look at the cost in terms of LSM-Tree seek() and next() operations rather than the time. This gives a better understanding.

I load my YBWR to get thsoe statistics:

-- load YBWR to gather statistics
\! curl -s https://raw.githubusercontent.com/FranckPachot/ybdemo/main/docker/yb-lab/client/ybwr.sql | grep -v '\watch' > ybwr.sql
\i ybwr.sql
Enter fullscreen mode Exit fullscreen mode

I create the same table with a unique index and a non-unique index:

create table demo (n bigint) split into 1 tablets;
create        index demo_n on demo(n asc);
create unique index demo_u on demo(n asc);
Enter fullscreen mode Exit fullscreen mode

I insert ten thousand rows and display the statistics per tablet:

execute snap_reset;
insert into demo select generate_series(1,10000);
execute snap_table;
Enter fullscreen mode Exit fullscreen mode

Here is the result. The L marks the tablets leaders. Here I have only one tablet per table and indexe. 10000 rows are inserted into the table (which is actually the primary key index that holds all column values), and 10000 index entries are inserted into into each index. The difference comes with the reads that are visible as seek operations:

yugabyte=# insert into demo select generate_series(1,10000);
INSERT 0 10000
yugabyte=# execute snap_table;
 rocksdb_seek | rocksdb_next | rocksdb_insert |        dbname / relname / tserver / tabletid / leader
--------------+--------------+----------------+--------------------------------------------------------------
              |              |          10000 | yugabyte demo hash_split: [0x0000, 0xFFFF]   10.0.0.61
              |              |          10000 | yugabyte demo hash_split: [0x0000, 0xFFFF]   10.0.0.62
        30000 |              |          10000 | yugabyte demo hash_split: [0x0000, 0xFFFF] L 10.0.0.63
              |              |          10000 | yugabyte demo_n d57765ebe1ed4e909975b4af052ccd3f   10.0.0.61
              |              |          10000 | yugabyte demo_n d57765ebe1ed4e909975b4af052ccd3f   10.0.0.63
        10000 |              |          10000 | yugabyte demo_n d57765ebe1ed4e909975b4af052ccd3f L 10.0.0.62
              |              |          10000 | yugabyte demo_u 505284b3d2484b35a2be2ec7be0ecf0b   10.0.0.62
              |              |          10000 | yugabyte demo_u 505284b3d2484b35a2be2ec7be0ecf0b   10.0.0.63
        30000 |              |          10000 | yugabyte demo_u 505284b3d2484b35a2be2ec7be0ecf0b L 10.0.0.61
(9 rows)
Enter fullscreen mode Exit fullscreen mode

The non-unique seeks into the index once per row inserted. The unique index, like the primary key, see more seek operations because it has to check for existing key in order to raise a "duplicate key".

I've also checked the read/write operations by enabling set yb_debug_log_docdb_requests=on and inserting the value 42 into this table where

  • the table primary key is identified as "000033f1000030008000000000004000"
  • the non-unique index as "000033f1000030008000000000004003"
  • the unique index as "000033f1000030008000000000004004": Image description

Adding the row to the table is an insert:

PGSQL_INSERT table_id: "000033f1000030008000000000004000" schema_version: 4 hash_code: 61921 
ybctid_column_value { value { binary_value: "47F1E1539DA9AC4E80EA4B3B8CC51E2BEDE5A9B100002121" } } 
column_values { column_id: 1 expr { value { int64_value: 42 } } 
Enter fullscreen mode Exit fullscreen mode

This says that the key, which is the internally generated YBCTID as I didn't define a SQL Primary Key, has the value 42.

Adding the index entry into the unique index is similar:

PGSQL_INSERT table_id: "000033f1000030008000000000004004" schema_version: 0 
range_column_values { value { int64_value: 42 } } 
range_column_values { value {                 } } 
column_values { column_id: 2 expr { value { binary_value: "47F1E1539DA9AC4E80EA4B3B8CC51E2BEDE5A9B100002121" } } } column_refs { } ysql_catalog_version: 7 partition_key: "49800000000000002A2421" }
Enter fullscreen mode Exit fullscreen mode

This adds an index entry with the value 42 as the key and the YBCTID as the value, to reference the table row.

The non-unique secondary index is simple, being an UPSERT operation as it doesn't need to check for the existence of a previous value:

PGSQL_UPSERT table_id: "000033f1000030008000000000004003" schema_version: 0 
range_column_values { value { int64_value: 42 } } 
range_column_values { value { binary_value: "47F1E1539DA9AC4E80EA4B3B8CC51E2BEDE5A9B100002121" } } 
Enter fullscreen mode Exit fullscreen mode

The difference here is that the YBCTID is part of the key to make it a unique entry into a non-unique index.

If I had declared a primary key on my table, it would have been used in place of the YBCTID.

To Summarize

Don't draw excessive conclusions from this simple test. Response time depends on various factors. However, remember the recommendation in YugabyteDB: if you don't need to declare the index as unique, perhaps because it's already enforced by another index, it may be more efficient to declare it as a non-unique index. This differs from other databases that utilize B-Tree indexes.

When it comes to the number of indexes, there are no universal rules since it varies depending on the queries. Similar to monolithic databases, each new index introduces overhead to DML operations. However, in YugabyteDB, these operations are scalable due to distribution across multiple nodes. On the flip side, reading too many rows from remote nodes can become costly. Index Scans, and even better, Index Only Scans, are crucial for maintaining high performance and ensuring scalability.

Top comments (0)