DEV Community

Karishma Shukla
Karishma Shukla

Posted on

How to Improve Performance of Your Database by Indexing Large Tables

What is Database Indexing?

Database indexing is a technique that makes searching and retrieving data from a database faster. It is like creating a quick guide for finding information in a large book. It helps speed up searches and makes finding things easier.

Indexing speeds up SELECT queries and WHERE clauses. On the other hand slows down INSERT and UPDATE queries.

Index Data Structure Components

Fig: Database Index Data Structure

Why Indexing?

Imagine you have a database of books, and you want to find all the books that have the word "programming" in the title. Without an index, the database would have to scan every row in the table to find the books that match the search criteria. This could take a long time, especially if there are a lot of books in the table.

However, if you create an index on the title column, the database can quickly find the rows that match the search criteria. The index is a separate data structure that stores the values of the title column in sorted order. The database can use the index to quickly find the rows that contain the word "programming" in the title.

Indexing A Table With 50 Million Rows

For this example, we will create a database pg-million in PostgreSQL containing table customers with columns: first_name, last_name, mobile_no, country.

Insert 50 million rows of random data

CREATE TABLE customers(first_name VARCHAR(50), last_name VARCHAR(50), mobile_no INTEGER, country VARCHAR(50))

INSERT INTO customers (first_name, last_name, mobile_no, country)
SELECT substr(md5(random()::text), 1, 10),
       substr(md5(random()::text), 1, 10),
       (random() * 70 + 10)::integer,
       (CASE WHEN random() < 0.5 THEN 'India' ELSE 'United Kingdom' END)
FROM generate_series(1, 50000000);
Enter fullscreen mode Exit fullscreen mode

Create an index on country column

We create an index on country column to have a well-organized list that lets us quickly locate all the customers from a particular country without searching through the entire list.

CREATE INDEX idx_partial_country ON customers (country) WHERE country IN ('India', 'United Kingdom')

Enter fullscreen mode Exit fullscreen mode

Time to create index: 2m 2s

For this example, we are using partial indexes. A partial index is created based on a condition that filters rows for specific values. This allows the database to index and optimize only the relevant rows, reducing the index size and improving query performance for those specific values.

Note: The syntax for creating indexes and types of indexes differs among different databases. You should use appropriate syntax and index type depending on your database and use-case.

Measuring Query Execution Time Before and After Indexing

Consider the following query

SELECT * FROM customers WHERE country='United Kingdom';
Enter fullscreen mode Exit fullscreen mode

Query Execution Time without index: 41836.270 ms

Query Execution Time with index: 24254.644 ms

Improvement in query execution time ~42.03%

(For better understanding you can find all the code here

How Well Are The Indexes Performing?

It is important to gain insights into index effectiveness. A few helpful metrics include:

  • Index Usage Statistics: Monitor the usage of indexes to understand which indexes are actively contributing to query performance. (Ex: Track the size of indexes, as larger indexes may impact disk space and I/O performance)

  • Query Performance Metrics: Monitor query execution times and response times for queries that involve indexed columns. (Ex: A sudden increase in query execution time may indicate index-related issues.)

  • Index Maintenance Metrics: Regularly assess the health of indexes and their impact on database operations. (Ex: Track index bloat, which occurs when indexes become inefficient due to excessive insertions, updates, or deletions.)

When To Use Indexing?

  • Frequent Search Queries: Use indexing when you frequently search for specific data in a large dataset. It helps to find the desired information quickly.

  • Performance Improvement: Indexing can improve the speed of data retrieval operations, especially for complex queries, by avoiding scanning the entire dataset.

  • Large Data Volumes: Indexing is used when dealing with sizable amounts of data, as it helps maintain efficient query performance even as the dataset grows.

When To Not Use Indexing?

  • Frequent Write Operations: Avoid excessive indexing if your database experiences frequent insert, update, or delete operations, as indexes can slow down these write operations and consume additional storage space. Indexes should not be used on the columns that are frequently manipulated.

  • Small Datasets: For relatively small datasets, indexing may not provide significant performance gains and can introduce unnecessary overhead. In such cases, the benefits may not outweigh the costs.

Conclusion

If you are looking for ways to improve the performance of your database, then database indexing is a good place to start. By creating indexes on the columns that are frequently used in queries, you can significantly improve the performance of your database and make your queries faster. However, it is important to weigh the benefits and drawbacks of indexing before making a decision.


If you like what you read, consider subscribing to my newsletter.
Find me on GitHub, Twitter

Top comments (14)

Collapse
 
richarddev_44 profile image
Richard

Amazing post!
Now I will try this with 500 million rows instead 😎

Collapse
 
karishmashukla profile image
Karishma Shukla

🙌

Collapse
 
karthiks profile image
Karthik

This is awesome. Your content is great.

Collapse
 
karishmashukla profile image
Karishma Shukla

Thanks Karthik

Collapse
 
kunle1984 profile image
Olaoye kunle

Wow. This is really intuisive. Thanks

Collapse
 
karishmashukla profile image
Karishma Shukla

Thank you

Collapse
 
gaurbprajapati profile image
gaurbprajapati

good content

Collapse
 
karishmashukla profile image
Karishma Shukla

Thanks 🙌

Collapse
 
franciscomedinav profile image
franciscomedinav

Great post

Collapse
 
karishmashukla profile image
Karishma Shukla

Thank you

Collapse
 
coutvv profile image
Roman Lomovtsev

Thank you for your post.

On my machine the numbers: 6.348s and 1.945s

But for checking this kind of performance I used aggregate function COUNT (dbeaver automatically limiting):
SELECT count(*) FROM customers WHERE country='United Kingdom';

Collapse
 
sientatrip112 profile image
sientatrip • Edited

Query Patterns: Understand which queries are frequently executed on the large tables. Analyze the SELECT, JOIN, and WHERE clauses to determine the most common access patterns.

Choose Appropriate Index Columns: Select columns that are frequently used in WHERE clauses and JOIN conditions. Focus on columns with high selectivity (many unique values), as indexing on these columns will yield better results.

Understand Index Types: Different database systems offer various index types, such as B-tree, Bitmap, or Hash indexes. Understand the strengths and limitations of each type of depthcrypto and choose the most suitable for your scenario....

Collapse
 
sohamboratee profile image
Soham • Edited

Hmm I wonder how this would work with larger data, like the content of a post or a blog perhaps... maybe we would need to use some natural language processing to extract keywords and then indexing based on that? I'm not sure though just a thought...

Collapse
 
coderollercoaster profile image
Felix Turner

Database indexing accelerates queries, yet impacts write operations. Balance its benefits for better database performance. Consider data volume and query frequency.