What is an Index in MySQL

#mysql #webdev #database #beginners

In this article I explain MySQL index like to a five year old.

I recently was invited for an oral interview after having impressed in algorithmic questions test.

I thought this was going to be easy-peasy, until some questions were posed to me and I started to adjust in my seat, wishing the heavens could open and hand me an answer booklet.

Some of the Questions asked was to name types of indexes in MySQL. Oh Gosh! My head went blank. I may have fumbled with explaining what an index is, but was totally empty about giving examples. At that moment I didn't even know that a primary key is an index.

Hence in this article, I explain what Indexes are. In the next article I will give examples of indexes and how to create, update and drop them.

Definition of the problem

Let's call our sample database table sales:

Year	Country	Product	profit
2000	Finland	Computer	1500
2000	Finland	Phone	100
2001	Finland	Phone	10
2000	India	Calculator	75
2000	India	Calculator	75
2000	India	Computer	1200
2000	USA	Calculator	75
2000	USA	Computer	1500
2001	USA	Calculator	50
2001	USA	Computer	1500
2001	USA	Computer	1200
2001	USA	TV	150
2001	USA	TV	100

Take a look at this query:

SELECT * FROM sales WHERE profit > 1000;

We need to find sales that made a profit of more than 1000.
MySQL will begin with the first row and scan through the whole table, to find profits with a value of more than 1000.

We will get our results almost instantly, if our sales table will remain to be a small dataset as shown above.
In real World, sales occur everyday, every hour across the globe.
An enterprise will definitely make more sales as time goes by. Therefore we can justifiably say that our sales table will grow if it were implemented in real world application.

Still with our SQL query, if we are to continue querying our sales table which is growing to have huge sums of dataset, finding our results will become slower. Because MySQL still has to scan through the entire table.

This is where indexes come to the rescue; to optimize queries.

We need to find rows with specific column values faster. This is what indexes do.

What is an index

An index is a data structure that provides a fast lookup capability for rows of a table, typically by forming a tree structure (B-tree) representing all the values of a particular column or set of columns.

Basically, Indexes are used to find rows with specific column value(s) faster.
They are created from the column(s).
Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the slower the finding.

Indexing will involve storing copies of the values from a column in a B-tree data structure.

The B-tree data structure(index) allows to quickly find a specific value, a set of values, or a range of values, corresponding to operators such as =, >, ≤, BETWEEN, IN, and so on, in a WHERE clause.

Think of B-tree data structure as an alphabetically sorted list, that lets you to easily lookup names that have been sorted in alphabetical order. The find is less painful that way.

If the table has an index for the column(s) in question, MySQL can quickly determine the position to seek to without having to look through all the data. This is much faster than reading every row sequentially.

What does MYSQL use indexes for?

Some cases that MySQL uses an index is:

To find the rows matching a WHERE clause faster.
To retrieve rows from other tables when performing joins.

Other cases:

To eliminate rows from consideration; If MYSQL has to chose from multiple indexes in a column, normally the index that finds the least number of rows, is used. Hence more rows that would have been retrieved by other indexes, are not considered.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
To find the MIN() or MAX() value for a specific indexed column.
In some cases, a query can be optimized to retrieve values without consulting the data rows. (An index that provides all the necessary results for a query is called a covering index.) If a query uses from a table only columns that are included in some index, the selected values can be retrieved from the index tree for greater speed.

Conclusion

Indexes are very powerful when it comes to greatly improving the performance of MySQL queries.

Indexes are less important for queries on small tables, or big tables where report queries process most or all of the rows. When a query needs to access most of the rows, reading sequentially is faster than working through an index. Sequential reads minimize disk seeks, even if not all the rows are needed for the query.

Thanks for reading. Hopefully I've made you smarter✨.

Are you on twitter? Let's connect