For ages, indexes have been the cornerstone of database performance. Indexes are one of the main measures to speed up reading – SELECT – operations and in this blog, we‘re telling you all about them. Have a read!
Ever since open source databases have been a thing, indexes have been that something that improves their performance. In different database management systems, different indexes have different usage types – some work together with certain types of indexes, some cover queries only partly, but all of them one thing in common – they speed up SELECT
operations.
Types of Database Indexes
As you may know, database indexes have a couple of types and each of them accomplish different purposes. These are the types of indexes available in MySQL alone (for the full list of possible index types, refer to the documentation):
- B-Tree indexes – these are the indexes that are used together with search queries that search for exact matches of data.
- R-Tree – or spatial – indexes can be used to index geospatial (geographical) data.
- Covering indexes that cover all of the columns that a query is using allowing a database to read the index instead of the data itself which is a faster process.
- Partial indexes that cover a part of the column that a specific query is using (such indexes are frequently used to save space on the disk.)
- Fulltext indexes that can be used to help developers run fulltext search operations on data sets.
- Indexes speed up reading (
SELECT
) operations, but slow down all of the other operations (INSERT
,DELETE
, andUPDATE
queries) in return.
When and How to Index?
The most frequent question regarding indexes is the time that’s best to index data. And the answer to this question is pretty straightforward, really – you should index data as soon as you feel your SELECT
queries grinding to a halt or as soon as you have a lot more data than usual (100,000 rows and above is a good starting point.)
To index or not to index also heavily depends on your disk – indexes take up space, so if you’re indexing millions of rows, expect the indexing process to take a toll on the disk space and to take longer than usual.
In this case, around 200,000 rows would be the perfect time to think about indexing – the data has space to grow, but at the same time the row count isn’t insignificant.
In most cases to add an index, you will make use of the CREATE INDEX
or the ALTER TABLE
query – they aren’t symlinks, but whether you run a query like CREATE INDEX idx_name(column_name)
or ALTER TABLE demo_table
ADD INDEX idx_name(column_name);
shouldn’t make much of a difference, just bear in mind that ALTER TABLE
queries work a little differently than ADD INDEX
queries do: if your table is bigger, they make a copy of the table on the disk, copy all of the data to it, make the requested changes, then swap the original table with the table that the DBMS created – that will take up time as well.
What Type of Index to Choose?
After you’ve decided what and when you will index, it’s time to decide what type of an index you should choose for your specific use case. In many cases, you will most likely be using B-Tree indexes: they are the most frequently used type of an index.
In other cases, though, your decision may be influenced by:
- The space available on the disk – if you’re low on disk space, use prefix indexes. They will let you index a prefix of the column.
- The type of data that your database holds – if you’re holding geographical data, use spatial indexes.
The specific scenario that you want your databases to adhere to:
- If you want to use fancy queries with wildcard support, searching in the boolean mode, etc. you would need to use full-text indexes.
- If you want to save disk space, you will use prefix indexes.
- If you’re working with the types of storage engines that store data in the memory and want to quickly read the data, consider using hash indexes.
- If you’re working with geographical data, you may need to use spatial – R-Tree – indexes.
Different types of indexes are defined in different ways – some of them can be seen below, but for most of us, ordinary – B-Tree – indexes will do the trick.
Columns can also have more than one type of index on them at once to satisfy multiple types of queries at the same time.
DbVisualizer and Indexes
Not only will DbVisualizer tell you what table holds what kind of an index, it will tell you many more details related to the index itself including:
- The name of the index and the column it’s on
- Whether the index is made in an ascending or descending order
- All about the index cardinality (how many unique values does the index hold)
Basically, you will be able to observe everything that’s necessary about the index itself – most of the details presented in this tab will be necessary for those who know a little bit about databases themselves or those who are solving issues relating to query performance when indexes are in use.
Query Performance and Indexes
As we’ve already noted, indexes are only necessary to improve the performance of reading data – they improve the performance of SELECT
queries. How many times the query performance will be improved, though, is directly dependant on your queries themselves – to maximize the performance of your queries when they’re using indexes, aim to make your queries as simple as possible, that is:
- Return as few rows as possible – switch
SELECT *
toSELECT column
. - Avoid
JOINing
data where possible. - Avoid
OR
queries where possible –OR
means “also think about…” which slows down performance. - Subqueries also slow queries down due to the resources, so be vary of that. If necessary, feel free to use them, but when using them, make sure that the subquery is using indexes as well.
Finally, bear in mind that not all queries will use indexes you’ve defined in the first place – make use of the EXPLAIN
keyword in database management systems to guide you to the light: as a rule of thumb, define the column that is indexed straight after the WHERE
part of the query to ensure that your query is actually using the index and you should be good to go.
Also bear in mind that indexes will slow down INSERTs
, UPDATEs
, and DELETEs
due to the fact that data needs to be updated together with the index when it’s in use: if you’re using an index, you should allocate more time for such operations as well.
Finally, keep in mind that database maintenance has an impact on your query performance as well – we recommend using SQL clients such as DbVisualizer to gain a deeper insight into your queries and your database instance itself: your database will thank you after a while.
Summary
In this blog, we’ve walked you through the importance of using indexes in database management systems. Most of the examples provided were centered around MySQL and its counterparts such as Percona Server and MariaDB, but the concept remains the same – indexes speed up SELECT
operations at the same time slowing down UPDATEs
, DELETEs
, and INSERTs
.
Indexes have multiple types each being able to solve a different problem, so before using them, dig into these as well, do know the risks involved, and you should be good to go.
When improving query performance with indexes, keep in mind that SQL clients like DbVisualizer can also help you solve problems related to maintenance and upkeep of your databases – grab a free trial and try it today, come back to the blog to learn more about databases in a while, and until next time.
About the author
Lukas Vileikis is an ethical hacker and a frequent conference speaker. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.
Top comments (4)
How does:
work? It certainly returns less data but it shouldn't return fewer rows. I expect the where clause determines the returned rows.
You should select as few rows as possible - e.g. if you have a table with 1,000,000 rows and your column only has 2,000 and you're selecting data only from that column, your database has less work to do. :)
Great quality article.
I did not knew about DbVizualizer, it seems RDBMS agnostic ?
R-Tree indexes thanks for this one by the way.
DbVisualizer support multiple database management systems. All DBMS supported by DbVisualizer can be found here: dbvis.com/supported-databases/
Hope that helps!