Indexing in DBMS

Indexing in DBMS is a very important aspect because of the benefits that it has. In this article, we will discuss what is indexing in DBMS. We will also cover various types of indexing in DBMS, and the need for indexing in DBMS.

What is Indexing in DBMS?

Relational databases are databases that are stored in the form of tables i.e. rows and columns. However, physically on the disk, if we have to search for some data, we have to make a sequential search (linear search) by traversing multiple rows. So, it is a very time-inefficient way of searching the data.

So, we want to make the searching in DBMS, a fast operation. In order to make the search queries efficient i.e. in order to reduce the search time, we need to do some modifications to the way we store the data.

So, searching can be made efficient by modifying the way we store the data. This is where Indexing comes into the picture.

Indexing in DBMS (Database Management Systems) is a technique used for optimizing search queries. This means that it is a technique to store the data in such a way that searching the data becomes quick and efficient.

This is done by creating some indices upon the data and accessing the data not sequentially, but rather directly by using these indexes.

Real-Life Analogy of Indexing in DBMS

The concept of indexing in DBMS is very similar to indexing in textbooks. Consider there is a textbook named “Learn Data Structures and Algorithms”. Now, in this textbook, there will be many Data structures and Algorithms like the stack, queue, linked list, array, DFS, BFS, knapsack, etc. So, let us say that we want to learn the knapsack algorithm in Dynamic Programming. For that, if there are no indices mentioned in the book, we will have to go through every page of the book till we reach our desired topic. However, if the indices are mentioned in the book, we can directly go to the page where the knapsack algorithm is being discussed.

So, indexing reduces search time. This is true not just for books, but also in DBMS. Now, let us see in particular, what indexing is in DBMS.

Structure of an index in DBMS?

Indexing is used to search faster within our database. In indexing, we use some data structures to optimize the search queries by changing the way the data is stored on the disk.

To understand indexing in DBMS, let us understand what happens at the memory level. So, first of all, an index table is created. This index table can have many columns but the main 2 attributes of this index table are “Search Key” and “Data Reference”.

This is shown below.

The Search Key attribute is actually the copy of the Primary Key (or Candidate Key) of our database relation.

The Data Reference (or Block Pointer) contains pointers to the address of the disk block where the actual data corresponding to the current Search Key is stored.

So, we can understand that the index table is a table of these key-value pairs, the key being the Search Key (Candidate key or Primary Key) and the value being the Data Reference (Block Pointer).

So, now that we have a basic idea about indexing in DBMS, let us now study the attributes of indexing in DBMS.

Attributes of Indexing in DBMS

The following are the attributes of indexing in DBMS.

1. Access Types: As the name suggests, here we are referring to the type of access i.e. value-based access or range-based search, etc.

2. Access Time: This is also very simple. This means the time taken to access (find) the particular data or data element from the entire data stored on the disk.

3. Insertion Time: For inserting the data into the disk (memory), it is necessary to find the appropriate empty space so that there is sufficient space to insert the data and then insert the data. So, the time taken to find the space to insert the data and then insert the data is called insertion time.

4. Deletion Time: Deletion operation involves first finding the data that we need to delete. Then, we delete the data, and changes are also made to the index structure. So, the time taken for finding the data (access time) plus the time taken to delete the data and modify the index structure is called deletion time.

5. Space Overhead: We have discussed above that in indexing, an index table is formed with at least 2 attributes and those are the key and the block pointers. So, this index table will itself acquire some space and that space will be acquired in the memory (or disk) in which the data itself is stored. This extra space that we need to store the index table or index structure is called space overhead.

Now that we have a fair idea about what indexing is and what are the attributes of indexing in DBMS, let us understand the types of indexes. So, the types of indexing in DBMS depend on the mechanism followed for file organization. Let us understand this in detail.

Types of Indexing in DBMS

There are basically 2 ways for file organization.

Sequential File Organization:

In this type of file organization, Ordered File Indexing is used. The ordered file indexing means that the indices are based on the sorted ordering of the data. We can simply say that the Keys will be sorted.

So, in Ordered Indexing, there are 2 types. The ordered indexing can either be sparse or dense. Let us understand them in detail.

Dense Indexing in DBMS

  1. In this type of ordered indexing, we have an index for every key value. This is shown below.

  2. As shown in the image above, for every key value 10, 20, 30, and so on, we have an index record in the index table. Since there are a lot of index records created because of this, hence the name of this type of indexing is dense indexing.

  3. Each record of this index table contains the search key and the block pointer i.e. a pointer or reference to the starting of that record in the memory.

Sparse Indexing in DBMS

  1. So, we have understood the logic behind naming the previous indexing technique as Dense Indexing. So, if this is sparse indexing, this means that fewer index records must be created for the data in this type of indexing. So, yes, this is the case.

  2. In Sparse indexing in DBMS, there are few index records. Each block here points to an item. This is as shown below.

  3. When we want to locate a record, we first arrive at a starting point in the memory. This start point is the address of some block pointer in the index records which is just smaller than the record we are searching for.

  4. Then, we sequentially traverse the records till we find the desired record.

So, these were the 2 types of ordered indexing in DBMS. Now, we can also have unordered indexing also known as Hash File Organization. So, let us now study this in detail.

Hash File Organization (Unordered Indexing in DBMS):

In hash file organization, there is no sorted ordering of the keys. The indices are based on hash values (or buckets) given by a hash function. There are 3 types of Unordered indexing. They are:

Clustered Indexing in DBMS

  1. Cluster indexing is a storage method used when more than two records are kept in the same file. Because several entries linked to the same subject are stored in one location, employing cluster indexing can lower the cost of searching.

  2. It also allows for the frequent joining of multiple tables (records).

  3. An ordered data file is used to define the clustering index. A non-key field is used to arrange the data file.

  4. In some instances, non-primary key columns that might not be distinct for each record are used to generate the index.

  5. In these situations, we will combine two or more columns to obtain the unique values and make an index out of them in order to identify the entries more quickly.

    An example of Clustered Indexing is shown above.

Non-Clustered Indexing

  1. Non-clustered indexing simply provides us with a list of virtual pointers or references to the real storage location of the data. The order of the index is not followed while storing data physically. Data is instead found in leaf nodes.

  2. The contents page of a book, as an example. Each entry includes the page or location of the data’s storage. The material on each page of the book is not arranged, however, the contents page provides an ordered list of the data points that are there.

  3. Because the data is not physically arranged in a way that allows for sparse ordering, we can only have dense ordering in the non-clustered index.

  4. As a result of the additional effort needed to extract the data by further following the pointer, it takes longer than the clustered index.

  5. Data is directly present in front of a clustered index in this scenario.

    An example of non-clustered indexing is shown above.

Now, there is one more type left called multi-level indexing. Let us study it in detail.

Multi-Level Indexing

  1. Indicators expand together with the database’s overall size. A single-level index may grow too large to store with many disc visits while it is kept in the main memory.

  2. The main block is divided into several smaller blocks by the multilevel indexing so that the same can be stored in a single block.

  3. The inner blocks of the outer blocks are further subdivided and point to the data blocks. With fewer overheads, this can be conveniently stored in the main memory.

  4. This is useful in the case of very large databases.

So, we have now studied all the types of indexing in DBMS. We hope that you have understood everything in detail and have got the concepts cleared.

Leave a Reply

Your email address will not be published. Required fields are marked *