Karishma Shukla

Posted on Aug 7, 2023 • Edited on Nov 14

How to Improve Performance of Your Database by Indexing Large Tables

#programming #database #postgres #performance

What is Database Indexing?

Database indexing is a technique that makes searching and retrieving data from a database faster. It is like creating a quick guide for finding information in a large book. It helps speed up searches and makes finding things easier.

Indexing speeds up SELECT queries and WHERE clauses. On the other hand slows down INSERT and UPDATE queries.

Fig: Database Index Data Structure

Why Indexing?

Imagine you have a database of books, and you want to find all the books that have the word "programming" in the title. Without an index, the database would have to scan every row in the table to find the books that match the search criteria. This could take a long time, especially if there are a lot of books in the table.

However, if you create an index on the title column, the database can quickly find the rows that match the search criteria. The index is a separate data structure that stores the values of the title column in sorted order. The database can use the index to quickly find the rows that contain the word "programming" in the title.

Indexing A Table With 50 Million Rows

For this example, we will create a database pg-million in PostgreSQL containing table customers with columns: first_name, last_name, mobile_no, country.

Insert 50 million rows of random data

CREATE TABLE customers(first_name VARCHAR(50), last_name VARCHAR(50), mobile_no INTEGER, country VARCHAR(50))

INSERT INTO customers (first_name, last_name, mobile_no, country)
SELECT substr(md5(random()::text), 1, 10),
       substr(md5(random()::text), 1, 10),
       (random() * 70 + 10)::integer,
       (CASE WHEN random() < 0.5 THEN 'India' ELSE 'United Kingdom' END)
FROM generate_series(1, 50000000);

Create an index on `country` column

We create an index on country column to have a well-organized list that lets us quickly locate all the customers from a particular country without searching through the entire list.

CREATE INDEX idx_partial_country ON customers (country) WHERE country IN ('India', 'United Kingdom')

Time to create index: 2m 2s

For this example, we are using partial indexes. A partial index is created based on a condition that filters rows for specific values. This allows the database to index and optimize only the relevant rows, reducing the index size and improving query performance for those specific values.

Note: The syntax for creating indexes and types of indexes differs among different databases. You should use appropriate syntax and index type depending on your database and use-case.

Measuring Query Execution Time Before and After Indexing

Consider the following query

SELECT * FROM customers WHERE country='United Kingdom';

Query Execution Time without index: 41836.270 ms

Query Execution Time with index: 24254.644 ms

Improvement in query execution time ~42.03%

(For better understanding you can find all the code here

How Well Are The Indexes Performing?

It is important to gain insights into index effectiveness. A few helpful metrics include:

Index Usage Statistics: Monitor the usage of indexes to understand which indexes are actively contributing to query performance. (Ex: Track the size of indexes, as larger indexes may impact disk space and I/O performance)
Query Performance Metrics: Monitor query execution times and response times for queries that involve indexed columns. (Ex: A sudden increase in query execution time may indicate index-related issues.)
Index Maintenance Metrics: Regularly assess the health of indexes and their impact on database operations. (Ex: Track index bloat, which occurs when indexes become inefficient due to excessive insertions, updates, or deletions.)

When To Use Indexing?

Frequent Search Queries: Use indexing when you frequently search for specific data in a large dataset. It helps to find the desired information quickly.
Performance Improvement: Indexing can improve the speed of data retrieval operations, especially for complex queries, by avoiding scanning the entire dataset.
Large Data Volumes: Indexing is used when dealing with sizable amounts of data, as it helps maintain efficient query performance even as the dataset grows.

When To Not Use Indexing?

Frequent Write Operations: Avoid excessive indexing if your database experiences frequent insert, update, or delete operations, as indexes can slow down these write operations and consume additional storage space. Indexes should not be used on the columns that are frequently manipulated.
Small Datasets: For relatively small datasets, indexing may not provide significant performance gains and can introduce unnecessary overhead. In such cases, the benefits may not outweigh the costs.

Conclusion

If you are looking for ways to improve the performance of your database, then database indexing is a good place to start. By creating indexes on the columns that are frequently used in queries, you can significantly improve the performance of your database and make your queries faster. However, it is important to weigh the benefits and drawbacks of indexing before making a decision.

Find me on GitHub, Twitter

Top comments (14)

Richard • Aug 7 '23

Amazing post!
Now I will try this with 500 million rows instead 😎

Karishma Shukla • Aug 7 '23

🙌

Karthik • Aug 8 '23

This is awesome. Your content is great.

Karishma Shukla • Aug 9 '23

Thanks Karthik

Olaoye kunle • Aug 10 '23

Wow. This is really intuisive. Thanks

Karishma Shukla • Aug 10 '23

Thank you

gaurbprajapati • Aug 14 '23

good content

Karishma Shukla • Aug 21 '23

Thanks 🙌

franciscomedinav • Aug 16 '23

Great post

Karishma Shukla • Aug 21 '23

Thank you

Roman Lomovtsev • Aug 16 '23

Thank you for your post.

On my machine the numbers: 6.348s and 1.945s

But for checking this kind of performance I used aggregate function COUNT (dbeaver automatically limiting):
SELECT count(*) FROM customers WHERE country='United Kingdom';

sientatrip • Aug 19 '23 • Edited

Query Patterns: Understand which queries are frequently executed on the large tables. Analyze the SELECT, JOIN, and WHERE clauses to determine the most common access patterns.

Choose Appropriate Index Columns: Select columns that are frequently used in WHERE clauses and JOIN conditions. Focus on columns with high selectivity (many unique values), as indexing on these columns will yield better results.

Understand Index Types: Different database systems offer various index types, such as B-tree, Bitmap, or Hash indexes. Understand the strengths and limitations of each type of depthcrypto and choose the most suitable for your scenario....

Soham • Aug 10 '23 • Edited

Hmm I wonder how this would work with larger data, like the content of a post or a blog perhaps... maybe we would need to use some natural language processing to extract keywords and then indexing based on that? I'm not sure though just a thought...

Felix Turner • Aug 23 '23

Database indexing accelerates queries, yet impacts write operations. Balance its benefits for better database performance. Consider data volume and query frequency.

View full discussion (14 comments)

DEV Community

How to Improve Performance of Your Database by Indexing Large Tables

What is Database Indexing?

Why Indexing?

Indexing A Table With 50 Million Rows

Insert 50 million rows of random data

Create an index on `country` column

Measuring Query Execution Time Before and After Indexing

How Well Are The Indexes Performing?

When To Use Indexing?

When To Not Use Indexing?

Conclusion

Top comments (14)

Read next

Using BroadcastChannel API with Vue to sync a ref across multiple tabs

How can one code line crash application? Looking for issues and vulnerabilities in ScreenToGif

I just cloned Threads!

Mastering PostgreSQL Performance: Linux Tuning and Database Optimization

What is Database Indexing?

Why Indexing?

Indexing A Table With 50 Million Rows

Insert 50 million rows of random data

Create an index on country column

Measuring Query Execution Time Before and After Indexing

How Well Are The Indexes Performing?

When To Use Indexing?

When To Not Use Indexing?

Conclusion

Read next

Using BroadcastChannel API with Vue to sync a ref across multiple tabs

How can one code line crash application? Looking for issues and vulnerabilities in ScreenToGif

I just cloned Threads!

Mastering PostgreSQL Performance: Linux Tuning and Database Optimization

Create an index on `country` column