dashboard image

Objectives :

4. Indexing

dashboard image
The Problem

As some point your application will persist data in a database. Typical relational database organize data storage in colums and rows. As illustrated below, the database store multiple tables, including the table "Book".

dashboard image

If you search for the book "Clean Architecture", the database will scan each row from "001" to "004". Here in the example we only have 4 rows, so the search will be faster. But imagine if we have hundreds thousands, or even millions of rows. The search could take longer. In the following example, we have to scan 100980001 rows if we search for "Fundamentals of Software Architecture".

dashboard image

One option to speed up the search is by using indexes.

How Database indexing works ?

The information stored in a database are in reality store on a disk. Indexing creates a data structure, commonly used and known data structure is the "b-tree". For example when we create an index on column "title", the database engine creates a structure with 2 columns, one with the indexed column and a pointer. The other columns are not included in that data structure (book_id and isbn).

For example, if we create an index on the column "title", the following data structure will be created. When there is a query on the title "Clean code", the database engine, look for the pointer to find the corresponding data on disk ("_103"), without the need to scan all the rows.

dashboard image

How to create a database index ?

To create an index, database adminstrator run a sample sql query like this one.

"CREATE UNIQUE INDEX title_idx ON book(title)"

dashboard image

Best uses cases to use indexing

Frequent search queries : If you regularly search for specific data in a large dataset, indexing prevent scanning the entire dataset.

Improving performance : If you often perform complex queries in large dataset, indexing optimize the performance.

Large volume of data: If you have a large volume of data, using indexing will drastically improve your queries.

Limits of using indexing

Frequent INSERT, UPDATE, DELETE queries: When you perform these operations with indexing, the index information have to be updated at the same time, resulting to a decrease in performance. As much as possible, prevent indexing columns that are frequently updated.

Small dataset: Indexing small dataset, do not make significant differnce on your performance queries, and might just add more complexities.