- An index is a collection of pairs of key and location. The key is the word by which we are looking. In the case of a book, the location is the page number. In the case of a database, it is the physical row identifier. Looking for a record in a table by physical row identifier has constant complexity, that is, it does not depend on the number of rows in the table.
- Keys are sorted, so we don't have to read all keys to find the right one. Indeed, searching in an index has logarithmic complexity. If looking for a record in an index of 1000 records takes 100 ms, it may take 200 ms in an index of million of rows and 300 ms in an index of billion of rows. (Here I'm talking about B-Tree indexes. There are other types of indexes, but they are less relevant for application development).
If we are looking for customers by name, we can perform the following physical operations:
- Seek the first entry in
IX_CUSTOMERS_LAST_NAME
whereLAST_NAME=@LastName
. This operation is named an index seek. - Read the index from this entry to the last where the
LAST_NAME=@LastName
is stilltrue
. This will cost to readPAGES(IX_CUSTOMERS_LAST_NAME)*Prob[LAST_NAME=@LastName]
pages from disk. This operation (always coupled with an index seek) is called an index range scan. - Each index entry found by the previous steps gives us the physical location of the a record in the
CUSTOMERS
table. We still have to fetch this record from the table. This will implyRECORDS(CUSTOMERS)*Prob[LAST_NAME=@LastName]
page fetches. This operation is called a table seek.