Frastyawan Nym

Posted on Apr 6, 2022

Database Indexes, please be careful when using it!

#database #sql

Lets do disclaimer before we start 😉.

This is my current understanding of Database Index.
If there is any mistake that write that lead to misunderstanding, please let me know and I will update this post ASAP.

Let's begin 🚀

What is database index?

Database index is a built-in tools that will help us for searching data inside database. Index will quickly locate record of our queries.

The problem

To help to understand, I will make up some use case. Example, we have a users table that have id, full_name & gender

That users table will have 1000 rows. 2 of them is male and the rest (998) of them is female.
Say that we need to find our male user, we will create query something like this SELECT * FROM users WHERE gender = 'male'.
After we executed that query, we will have result of 2 users.

Behind the scene, database engine will search all of 1000 rows to find that 2 rows.
Yes, that sound inefficient. But that is how database engine works. 😵

What is the solution?

The solution is to use Database Index. 💡
With index, database engine will make some of algorithm that map our table based on the index we chose.

Okay, so we create an index on gender column inside users table.
Back again to our query, now if we execute SELECT * FROM users WHERE gender = 'male', we will get 2 users again as result (of course!).

The difference is, right now database engine will search on 2 rows to find that 2 rows.
Now that sound efficient❗

But wait...

After we know that, maybe there is a thinking like this:

"let's create index for all of our column, so we can get quick result every time we search something 🤔"

Thats right, you will get a quick query every search.
But please don't over-used it. Even though Index have upside, Index also have downside.

If we create an index, behind the scene database engine will write additional data to our database.

Lets back again to our users table. The calculation is like this:

2 index (1 for ID, 1 for gender) * 1000 rows = 2000
# PK is always get index

Imagine if we put index to all of our column

4 index (ID, gender, full_name, created_at) * 1000 rows = 4000
# Thats only 4 columns.
# The more column we create, the more usage it will take when CREATE, UPDATE, DELETE, etc.

the conclusion is: please use it wisely. 😇

Bonus: How to create index

PostgreSql

CREATE TABLE "customers" (
  "id" SERIAL PRIMARY KEY,
  "gender" varchar,
  "full_name" varchar,
  "created_at" timestamp
);

CREATE INDEX ON "customers" ("gender");

MySql

CREATE TABLE `customers` (
  `id` int PRIMARY KEY AUTO_INCREMENT,
  `gender` varchar(255),
  `full_name` varchar(255),
  `created_at` timestamp
);

CREATE INDEX `customers_index_0` ON `customers` (`gender`);

Top comments (2)

Vincent Milum Jr • Apr 10 '22

If you have low cardinality of your data, in PostgreSQL, you can use a partial index instead.

In your example, you have 2 items on one side, and 998 on the other. This is the perfect scenario for a partial index. This has the ability to store an index for the 2 items, while ignoring the other 998 rows. So instead of creating 1000 index entries, only 2 are created, keeping storage usage minimal, AND still retaining the performance benefits!

postgresql.org/docs/current/indexe...

Frastyawan Nym • Apr 12 '22

Thanks for the addition Vincent!

I didn't know before that we could create conditions on index

DEV Community

Database Indexes, please be careful when using it!

What is database index?

The problem

What is the solution?

But wait...

Bonus: How to create index

Top comments (2)

Read next

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Code Review Assistant using Open Source AI Challenge with pgai and Ollama

Best Redis Caching Strategy in Laravel: A Guide to Fast and Efficient Caching

All About Parquet Part 02 - Parquet's Columnar Storage Model