Imagine, I give you a phone book and ask you to find me the phone number “Harry Potter” from it. What would you do? 🤔
Search every page of the phone diary for Harry’s number. ( I wish my phone book had Harry Potter's number but anyways!💔)
Well, what if I give you a phone book with names that are alphabetically arranged?
Switch to the page with names starting with the letter ‘H’. You can complete the task with lightning speed ⚡.
The concept of indexing is pretty straightforward. Indexes are data structures that help in searching. To understand indexing, let’s see what a database engine does without indexing!! 😱
The world with no indexes!
To find a record that satisfies a particular query a database engine does a full table scan, which means it checks the entire row one by one. The full table scan is called a sequential scan.
However, databases have a way of making this sequential search faster even without indexing. Like, Postgres uses parallel sequential scan, in which several workers together scan different range of rows in a table so that the full table scan is completed faster. 🤝
A full scan on a table with billions of records will be a nightmare!!
Enters Indexing
Indexes are created on a column/columns. An index is a data structure that directly gives the database engine the address of the rows that could satisfy the condition of the query, saving the time to scan all the rows of the table. With the address of the rows, the database engine then fetches the information needed from the disk by performing an I/O.
Remember ☠️ → The index just gives the memory address based on the key on which it is created. For example in a student database index is created on the column Id.
SELECT name FROM Student
WHERE Id=5;
In this particular query, the address of the row with an Id equal to 5 will be searched using the index, efficiently, but we will need to jump back to the disk to fetch the name from the row, using the address. This means the index is a different data structure and the table is a different data structure stored at different places. This can take I/O operations and hence can take time. 😰
To know more about disk and RAM storage → Read More
Index Scan and Bitmap Index Scan
If the database uses an index for a query it uses an index scan to search for a record.
For every row address found from the index, the engine jumps to the disk for the row information to the database. This can take a lot of I/O’s.
To make this easy to understand, let’s say you went to a supermarket with your father 🛍️🥖. Your father stands near the gate and orders you to find the item and bring it to him one by one. This takes a lot of time. ⏱️
So, instead, you make a list of items and bring them all at once. 🤓 Bitmap Index Scan works similarly.
Using a simple index scan it finds the addresses it needs to fetch, and then jumps to the disk to get the data in bulk. This
takes a lot less I/O’s.
If you like the article do give a 💜
In Part 2 of Indexing 101
- Non-key columns
- Composite Index
- How are index stored
- Does the database always use index? 🤔
Along with query analytics
Stay Tuned 🥺
Top comments (2)
👏👏👏
Some comments may only be visible to logged-in visitors. Sign in to view all comments.