In this article, we’ll look at the topic of indexes in databases.
Have you ever tried to use an encyclopaedia? One of those big, thick books that has a lot of information on a range of topics?
The concept of an SQL index is similar to that of using an encyclopaedia.
Let’s say you wanted to find information for a specific topic: Labrador dogs. You could do this in a few ways:
- Start at page 1 and look through each page until you found Labrador dogs
- Flip to a random page and see if you can find Labrador dogs, and repeat until you find it.
This would be very time consuming and not efficient at all.
Luckily, encyclopaedias use a few different methods for making it easier to find information. The topics are often sorted alphabetically.
There is often an index at the back of the book as well, which includes all of the topics in the book. If we look for Labrador dogs in the index, we could see:
Labrador dogs… pg 100, 125, 217
This tells us that Labrador dogs are mentioned on these three pages. We can simply go to those pages to find out more about Labrador dogs.
This is a similar method to how indexes in databases work. They take up a bit more space, but they can help you find what you are looking for. They can also work best when you know the records you are looking for.
There are many types of indexes in most databases, but the most common is what’s called a b-tree index.
B-tree stands for “balanced tree”, and it’s called that because it accesses the elements of the index with the same number of steps. We’ll explain what this means soon.
Let’s explain how this works using our example from earlier. We want to find the mention of Labrador dogs on page 125.
A b-tree index would work like this:
- Determine the record (or the page, in this case) we are looking for. In this case, it is 125 (the page number).
- Look at the first level of the index to find the range of values that includes the number 125. The ranges could be broken up into groups of 100: 1-100, 101-200, 201-300, 301-400.
- Move to the second level of the index that was identified in the previous step. In this case it would be the range of 101-200, because 125 sits in that range.
- Find the range of values in the second level that covers 125. This could be in groups of 20: 101-120, 121-140, 141-160, 161-180, 181-200. In this case it would be the range 121-140.
- Find the record that has an ID of 125 and return that record.
There is a repeating sequence of steps to “look in a range” and “move down a level”. This can happen up to five times in most cases.
The result is that the correct record is found, much faster than scanning every record.
So how can you create a B-tree index? It depends on the database system, but the syntax for Oracle looks like this:
CREATE INDEX index_name ON table_name (columns);
You’ll need to specify a few things:
- index_name: the name you want to give to the index.
- table_name: the name of the table you’re creating the index on.
- columns: the column or columns you want to include in the index.
If we had a table that represented the example above, our SQL statement would look like this:
CREATE INDEX idx_topic_pageno ON topic(page_no);
In this example, the new index is called idx_topic_pageno, and it is created on the topic table on a column called page_no.
Another index type that is common in database systems is a bitmap index.
This is represented in the system as a two-dimensional table, which is often called a “map” of values.
For each row, there is a ROWID, which is a unique identifier for the row. For each column, there is a separate value of a specific column.
At the intersection of rows and columns is a single bit. This bit indicates whether that ROWID has the value mentioned in the column.
For example, we have a table that stores clothes and each row has a size. a bitmap index could look like this:
Why does this matter? Why is it useful?
It’s a different way of accessing the data more efficiently. If there is a small range of values (such as S/M/L shown above), then it’s often better to use this kind of index.
When you create an index, you don’t need to specify how each value maps to each row. This is all done automatically by the database system.
To create a bitmap index (in Oracle, anyway), the syntax is:
CREATE BITMAP INDEX index_name ON table_name (columns);
The only difference between the syntax for this bitmap index and a b-tree index is the addition of the word BITMAP. This is the syntax for Oracle - other databases might be slightly different.
You can also specify the index_name (the name of the index), the table_name (the table you’re creating the index on) and the columns (the columns that the index is being created on).
If we use the example above to write a statement to create this index, it could look like this:
CREATE BITMAP INDEX idx_product_size ON product (product_size);
This would create an index called idx_product_size on the product_size column on the product table.
There are many different types of indexes in Oracle and other database systems. I’ll briefly explain some of them here.
Function-based index: a type of index that is created using a function or expression, to improve the efficiency of queries with functions in them.
Reverse Key Index: a different way of storing and sorting data in a b-tree index.
Bitmap Join index: a bitmap index on the join of two tables.
In my guide to Oracle indexes, I cover some more specific Oracle indexes and how they are displayed in the explain plan statements.
Often, the first thing a developer will do when looking to improve a slow-running query is to create an index. I’ve been guilty of this in the past as well.
However, once you understand how indexes work, then you’ll know when the right time to create an index is, and how it can help you. Understanding your data and table structure is also important.
So, you should consider a few things when creating indexes:
- Creating indexes on columns in WHERE clauses or JOINs is a good idea.
- If a column has a large percentage of unique records, a b-tree index is usually more suitable.
- If a column has a low percentage of unique records, a bitmap index is usually better.
- Consider creating an index on foreign key columns (which are often used in joins anyway), as they are not created automatically (in Oracle anyway).
- Consider creating a function-based index if the column in your WHERE clause uses a function or expression in the criteria.
There are advantages and disadvantages to creating indexes. We’ve mentioned the advantages in this article. The disadvantages are:
- Indexes take up more space on the database. This can be quite significant, depending on your table, and if you’re restricted by space, it’s something to consider.
- Indexes can slow INSERT, UPDATE, and DELETE queries, because the index and the table need to be updated whenever these statements are run.
So, overall, indexes can be helpful and are a great way to improve the performance of your queries. There are a few different types, which are suitable in different situations, and they are pretty easy to create. Understanding the pros and cons of each, and your data structure, is helpful when deciding when and if to create indexes.