Index Overview

#database #sql #index #rdbms

What was the first thing you used to do when you got the first set of text books every year in high school?

Well I used to have a look at the content page to get a glimpse of the topics scheduled to be covered that year. Think what if we don’t have such a reference page, and then we need to go through the mundane task of going page by page scanning through the contents of the book. Time-consuming isn’t it?

Similarly to return the query results faster we use indexes.

What is an index?

Index can be created either on a table or a view, and consists of a set of pages(each 8KB in size) and organized in a Binary Tree Structure. This B-tree is hierarchical with the root node at the top and the leaf node at the bottom.

How indexing actually works?

Suppose we’ve an index on the emp_id column of a table and we’ve issued a query related to that column, then the query engine follows a top to bottom approach in reaching to the appropriate value. Suppose we’re looking for emp_id 96, and then it would initially go to the Root level (here 1-200) to check which page it should traverse next and then would reach to the page having values of (1-100) followed by down the order page having values of (51-100) in the Intermediate level. Following this process, the query engine finally reaches the Leaf Node for the value of 96. Here there will be two cases depending on whether it is a Clustered Index or a Non-Clustered Index. Both of these types are elaborated below.

What is life of a table without Index?

Can a table exist without any index? Well yes it surely can. There are non-indexed tables as well, those are called heaps. Think of a situation when all your clothes are piled up and you want that favorite t-shirt for a date night, what do you do? You tend to go through each piece of clothing in the pile until you find the tee. But in exactly opposite circumstances, suppose you had all your clothes well organized in different sections in a wardrobe. So to find the t-shirt, you just had to search it in the section where all your tees were placed.
SQL Server will do a table scan to search data in a heap. That means it will read all the rows of this table until it finds one with the emp_id it is looking for (say emp_id =50). This kind of scan can be both resource and time consuming, hence better to have indexes for queries which are frequently used.

Different Type of Indexes

Clustered Index: This index stores the actual data rows at the leaf Node level. It can store the indexed values both in ascending or descending order. Hence there can only one Clustered index per table/view.

Non-Clustered Index: Quite opposite to the clustered index, the non-clustered index contains row locations that point to the rows where data is present. (Think of them as pointers in C++, which you read about probably a millennium ago) In a non-clustered table or heap, the row location points to the location of the clustered index which in turn has stored the data rows, where as in a clustered index it refers to the actual data row. This is why a Clustered index always returns values faster than a non-clustered index.

Composite index: Index can be also created based on multiple columns of a table, known as Composite index. Both Clustered and non-clustered index can be composite indexes. If a table has no column with unique values that’s when two or more columns are combined to create a unique clustered index combination.

Unique: By declaring an index as unique, SQL Server ensures that there is uniqueness in each value in the indexed column.

Creating, Updating and Deleting Indexes:

Clustered Index:
CREATE CLUSTERED INDEX ix_parts_id
ON Production.parts (part_id);

Non-Clustered Index:
Create INDEX IX_tblEmployeePayHistory_Rate
on [AdventureWorks].[HumanResources].EmployeePayHistory (Rate ASC);

Composite Index:
Create INDEX IX_tblEmployeePayHistory_Rate
on [AdventureWorks].[HumanResources].EmployeePayHistory (Rate ASC, PayFrequency desc);

Dropping index:
DROP INDEX IX_ProductVendor_BusinessEntityID

ON Purchasing.ProductVendor;

Note: Dropping an index doesn’t delete any data. If a clustered index is dropped, then that corresponding Table/View gets converted to a heap. Albeit, when a clustered index is dropped, it might take some time as all the other non-clustered indexes which have been using the clustered index as a reference need to replace with row pointers to the heap.

Constraints and Indexes:

The second most debatable topic on this planet is whether primary key constraint and the clustered index is the same thing. (Obviously the first being - If Keanu Reeves is a vampire???)
Constraint is a condition maintained by the database to preserve integrity, and Index is a data-structure used to retrieve data while querying. Hence clearly both are not the same - Myth Debunked.
By default the SQL Server creates a clustered index whenever we create a table with a primary key constraint.

Demerits of Indexing:

Space Consumption:The clustered index won’t require any space as it is the way data is physically stored in the table. But every non-clustered index created needs additional storage space.

Data Modifications:When there are any data manipulations performed like Insert, Update or Delete, all the indexes also need to be updated. So if there are too many indexes, it might be a challenge during data modifications.

LOB Column Indexes: Although one might argue that we can create Index on almost all of the commonly used datatypes, yet there are exceptions of those columns configured with large object (LOB) data types, such as image, text, and varchar(max). Also creating indexes on XML datatype is a tedious process.

Thanks For your Read! It was my first Post here, hope it was of value to you! :)