DEV Community

Andrew Atkinson
Andrew Atkinson

Posted on

 

How Long Does It Take To Create An Index?

A recent tweet asked the following question.

Does anybody know how long it should take to create an index of two integer columns with approximately 110 Million records in the DB? #postgres #postgresql #NotAnDBA

Let's put together an experiment. Create a table and populate 110,000,000 million rows with 2 columns, each with a random integer value. Collect the duration of the CREATE INDEX by enabling \timing, before adding a single column index on one of the columns.

-- create table "t", with columns "a", and "b"
CREATE TABLE t (
    a smallint,
    b smallint
);
Enter fullscreen mode Exit fullscreen mode
-- Populate 110,000,000 million records
INSERT INTO t(a,b)
SELECT
  (RANDOM() * 1000)::INT,
  (RANDOM() * 1000)::INT
FROM GENERATE_SERIES(1, 110000000) seq;
Enter fullscreen mode Exit fullscreen mode
-- Confirm 110,000,000 records
SELECT COUNT(*) FROM t;
   count
-----------
 110000000
Enter fullscreen mode Exit fullscreen mode

Enable timing to collect the duration of running the CREATE INDEX statement.

-- turn timing on
\timing
Timing is on.

-- Create index "t_a_idx" on table "t" on the "a" column
CREATE INDEX t_a_idx ON t (a);

Time: 52348.022 ms (00:52.348)
Enter fullscreen mode Exit fullscreen mode

Answer

In this test, it took about 52 seconds to create the index.

Test Environment Details

  • Tested on a M1 MacBook Air, with 16GB RAM, and no other significant system load.
  • The table has no sequences, constraints, or other indexes
  • Vacuum was running and was cancelled when the CREATE INDEX started per the postgresql.log.
  • No other queries were running on the table.

Besides the single data point, this post is intended more as a demonstration of how to conduct this sort of experiment on your own.

The best way to answer questions like this, due to the number of contributing factors that can change the result, is to create a test on your own server, or an experimentation server where production can be simulated as accurately as possible.

Although it was not used in this example, when other queries are running on the table, remember to use the CONCURRENTLY keyword when creating indexes.

Top comments (0)

Timeless DEV post...

Git Concepts I Wish I Knew Years Ago

The most used technology by developers is not Javascript.

It's not Python or HTML.

It hardly even gets mentioned in interviews or listed as a pre-requisite for jobs.

I'm talking about Git and version control of course.

One does not simply learn git