What was the first thing you used to do when you got the first set of text books every year in high school?
Well I used to have a look at the content page to get a glimpse of the topics scheduled to be covered that year. Think what if we don’t have such a reference page, and then we need to go through the mundane task of going page by page scanning through the contents of the book. Time-consuming isn’t it?
Similarly to return the query results faster we use indexes.
Index can be created either on a table or a view, and consists of a set of pages(each 8KB in size) and organized in a Binary Tree Structure. This B-tree is hierarchical with the root node at the top and the leaf node at the bottom.
Suppose we’ve an index on the emp_id column of a table and we’ve issued a query related to that column, then the query engine follows a top to bottom approach in reaching to the appropriate value. Suppose we’re looking for emp_id 96, and then it would initially go to the Root level (here 1-200) to check which page it should traverse next and then would reach to the page having values of (1-100) followed by down the order page having values of (51-100) in the Intermediate level. Following this process, the query engine finally reaches the Leaf Node for the value of 96. Here there will be two cases depending on whether it is a Clustered Index or a Non-Clustered Index. Both of these types are elaborated below.
Can a table exist without any index? Well yes it surely can. There are non-indexed tables as well, those are called heaps. Think of a situation when all your clothes are piled up and you want that favorite t-shirt for a date night, what do you do? You tend to go through each piece of clothing in the pile until you find the tee. But in exactly opposite circumstances, suppose you had all your clothes well organized in different sections in a wardrobe. So to find the t-shirt, you just had to search it in the section where all your tees were placed.
SQL Server will do a table scan to search data in a heap. That means it will read all the rows of this table until it finds one with the emp_id it is looking for (say emp_id =50). This kind of scan can be both resource and time consuming, hence better to have indexes for queries which are frequently used.
Clustered Index: This index stores the actual data rows at the leaf Node level. It can store the indexed values both in ascending or descending order. Hence there can only one Clustered index per table/view.
Non-Clustered Index: Quite opposite to the clustered index, the non-clustered index contains row locations that point to the rows where data is present. (Think of them as pointers in C++, which you read about near a millennium ago) In a clustered table or heap, the row location points to the location of the clustered index which in turn has stored the data rows, where as in a heap it refers to the actual data row. This is why a Clustered index always returns values faster than a non-clustered index.
Composite index: Index can be also created based on multiple columns of a table, known as Composite index. Both Clustered and non-clustered index can be composite indexes. If a table has no column with unique values that’s when two or more columns are combined to create a unique clustered index combination.
Unique: By declaring an index as unique, SQL Server ensures that there is uniqueness in each value in the indexed column.
CREATE CLUSTERED INDEX ix_parts_id
ON Production.parts (part_id);
Create INDEX IX_tblEmployeePayHistory_Rate
on [AdventureWorks].[HumanResources].EmployeePayHistory (Rate ASC);
Create INDEX IX_tblEmployeePayHistory_Rate
on [AdventureWorks].[HumanResources].EmployeePayHistory (Rate ASC, PayFrequency desc);
DROP INDEX IX_ProductVendor_BusinessEntityID
Note: Dropping an index doesn’t delete any data. If a clustered index is dropped, then that corresponding Table/View gets converted to a heap. Albeit, when a clustered index is dropped, it might take some time as all the other non-clustered indexes which have been using the clustered index as a reference need to replace with row pointers to the heap.
The second most debatable topic on this planet is whether primary key constraint and the clustered index is the same thing. (Obviously the first being - If Keanu Reeves is a vampire???)
Constraint is a condition maintained by the database to preserve integrity, and Index is a data-structure used to retrieve data while querying. Hence clearly both are not the same - Myth Debunked.
By default the SQL Server creates a clustered index whenever we create a table with a primary key constraint.
Space Consumption:The clustered index won’t require any space as it is the way data is physically stored in the table. But every non-clustered index created needs additional storage space.
Data Modifications:When there are any data manipulations performed like Insert, Update or Delete, all the indexes also need to be updated. So if there are too many indexes, it might be a challenge during data modifications.
LOB Column Indexes: Although one might argue that we can create Index on almost all of the commonly used datatypes, yet there are exceptions of those columns configured with large object (LOB) data types, such as image, text, and varchar(max). Also creating indexes on XML datatype is a tedious process.
Thanks For your Read! It was my first Post here, hope it was of value to you! :)