- Index does not contain all columns from a table - an index contains as many columns from a table as you define. If you put all columns into an index then it will contain all columns.
- Index data is in memory - index data is stored on disk & read from it the same as the data table. A database reads the data from disk in pages/blocks and can cache it in memory, both data blocks & index blocks. The cache size is limited by available memory and rarely used blocks are evicted from it. An index block is a subject to eviction likewise a data block.
- Index is fast because it's in memory - index is fast because it's utilizing a B-Tree data structure with logarithmic complexity. B-Tree is optimized for storing on disk and allows quick access even when an index isn't cached in memory.
- If you have an index on columns used for search you are good - an optimizer decides whether to use an index or not based on its selectivity. If selectivity is low then the optimizer might decide that it's faster to perform a full scan, i.e. read the whole table sequentially.
-
Order of columns in
where
makes a difference - a database doesn't care if the order of columns inwhere
matches the order of columns in a composite index. The optimizer is smart enough to figure out that the index can be used. - Order of columns in a composite index doesn't matter - it does matter because only leftmost columns can be used for index search. The index can't be used if you search by a rightmost column in a composite index.
- If an index works on QA environment it will work in PROD - usually QA & PROD environments differ a lot by the size of datasets & columns cardinality. As a result an index that has good selectivity on QA might have awful selectivity on PROD. The selectivity might also degrade over time so you have to monitor slow queries & index usage.
I've interview few dozens of software engineers and these are the most common misconceptions they had. Many candidates weren't even able to explain what the index is, besides "it's used to speed up queries". Those who mentioned the B-Tree data structure still would often get the details wrong.
What is your experience interviewing devs? Do they generally have good understanding of database fundamentals?
Top comments (0)