What is Concurrency Control in DBMS?

In this article, we are going to brief you on what actually is concurrency control in DBMS and why it plays a major role in handling a database management system. By the end of this, you will be having clarity related to concurrency control problems and concurrency control techniques in DBMS.

To have prerequisite knowledge of serializability, generalisation of transitive dependence can be an added bonus in this article on concurrency control in DBMS.

What is Concurrency Control?

Concurrently control is required in the first place due to the transactions that take place without any rules or protocols set in place for them leading to risk in terms of the integrity of the data.

A Transaction is an atomic work done in a database where a transaction requires an appropriate waiting time to be executed. The anomaly arises when there are numerous transactions lined up and a single transaction leads to delayed processing. To curb this issue, concurrency was introduced to reduce the cumulative waiting time and increase the throughput by means of making transactions occur concurrently under the hood.

When multiple transactions take place simultaneously, there is always a risk of data being exploited and resulting in unnecessary errors. Hence, the principles of atomicity, consistency, integrity and serializability are always obeyed as part of concurrency control in DBMS.

Concurrency Control in Depth

Looking at the real-world example to understand concurrency control in DBMS with more depth, let’s suppose a courier delivery service has to export five units and one ship can take one unit at a time. Therefore, to avoid delay in the export of couriers, other ships need to be arranged to take the other four units concurrently.

If it were to wait for one ship to reach before others could depart, it would have been costly in time. Hence, concurrency is useful in the case of multiple transactions executing with better throughput.

Now that we have some clarity on the topic, we will move further to study concurrency control problems and concurrency control techniques in DBMS.

Concurrency Control Problems

The problems that arise with the application of concurrency control in DBMS are as follows:

Dirty Read Problem:

Prevalent issue when data is read by a transaction that is already updated in another transaction and left uncommitted at the point of instance causing an issue in concurrency.

With the help of the following table, it might be easy for you to understand the working of the dirty read problem.

Time Transaction 1 Transaction 2
1 READ (G) β€”
2 G = G + 500 β€”
3 WRITE G β€”
4 β€” READ G
5 β€” COMMIT
6 ERROR ROLLBACK β€”

As the following table states that the value of G was updated in 3rd second and read by the second transaction in 4th second and committed due to which there was an erroneous rollback in the first transaction that made the value to initial again. This is known as the Dirty Read Problem.

Unrepeatable Read Problem:

In this problem, there is more than one value of the same data read in the same transaction. Let us have look at the table to understand this problem in concurrency control in DBMS.

Time Transaction A Transaction B
1 READ G (500) β€”
2 β€” READ G
3 G = G + 100 β€”
4 WRITE G β€”
5 β€”- READ G

The value of G was read in transaction B before it was overwritten to be 500+100 i.e 600 in transaction A and reread in transaction B where the value of G turns out to be 600 leading to the ambiguity of values in the same transaction.

Phantom Read Problem:

Data is read in the same transaction but when read later, it encounters an error due to deletion leading up to its usage in Transaction A.

Time Transaction A Transaction B
1 READ G β€”
2 β€” READ G
3 DELETE G β€”
4 β€” READ G

Since the data was deleted in the 3rd second in Transaction A, it is not going to be accessed by Transaction B in the 4th second. Therefore, it will throw an error, what is famously known as the Phantom Read Problem.

Lost Update Problem:

Similar to its counterparts discussed the difference is that two different transactions will have their updates made over creating a problem in concurrency control in DBMS.

Time Transaction A Transaction B
1 READ G (500) β€”
2 G = G + 100 β€”
3 WRITE G β€”
4 β€” G = G + 200
5 β€” WRITE G
6 READ G β€”

In Transaction A, G was read and updated whilst the same was followed up in transaction B. When Transaction A reads the value for the second time in the 6th second, it was found that the value set up in Transaction A was altered creating a problem for concurrency control in DBMS.

Incorrect Summary Problem:

As the name suggests, the incorrect summation of data in respective transactions with the wrong aggregation is termed the Incorrect Summary Problem. The following data explains:-

Time Transaction A Transaction B
1 READ G (500)
2 SUM = 0
3 SUM = SUM + G
4 READ F (200)
5 F = F+100
6 READ F
7 SUM = SUM + F

In the given table, SUM adds up the value of G and F which is overwritten from 200 to 300. In the aftermath of the 5th second, when F is read and summed up, it becomes 800 instead of 700 making a point that Transaction B is affecting the result of Transaction A.

Concurrency Control Techniques in DBMS:

The concurrency control techniques in DBMS that are used to avoid problems related to concurrency control in DBMS in order to maintain consistency and serializability, also known as Protocols are as follow:-

Lock-Based Protocol:-

In order to avoid issues related to inconsistency, one of the foremost requirements is to achieve isolation between all the transactions and to achieve that, locking is done on transactions on account of any read/write operation.

The two variant locks used in the lock-based protocol are

  1. Shared Lock
  2. Exclusive Lock

Shared Lock:
It locks the write operations but enables the read operation to take place hence they go by the name, read-only locks. They are denoted by β€˜S’ in a transaction.

Exclusive Lock:
For certain data, It locks both the read and write operations in a transaction and is denoted by β€˜E’.

Four types of lock-based protocols are:-

1. Simplistic Lock Protocol:-
It locks all the operations in the process, the moment when data is about to be updated and unlocks the operations afterwards.

2. Pre-claiming lock protocol:-
Before enabling locks, all the operations are analyzed and the ones that fall in the checklist of the problem-causing operations are locked only in case all the locks are available and the transaction is performed effectively else rollback is performed.

3. Two-Phase Locking:-
This technique is performed in three stages.

  • Asks for the availability of locks
  • On acquiring all locks, transaction releases first lock
  • Rest of the locks are released one by one after each operation.

4. Strict Two-Phase Locking:-
Slight modification in Two-phase Locking where the locks are not released after each operation but once all the operations are done executing for good and the commit is triggered, the collective release of locks is performed.

Time-Based Protocol:-

All the transactions are tagged with a timestamp which denotes the time when the first and latest read and write operations were performed on them. Timestamp Ordering Protocol is put in place where the timestamp of operations is responsible for ensuring serializability. It holds three timestamps, one for operation and the other two for Read and Writing time. (R & W)

Let’s suppose A is our transaction and G is data and we want to perform a write operation on data.

If Timestamp(A) < R(G) which indicates the Timestamp on the operation is lesser than the time when the data was read last, in such a scenario, rollback will be performed as a result of the data that was read later than the timestamp of the transaction.

Also If Timestamp(A) < W(G) which indicates the Timestamp on the operation is lesser than the time when the data was written last, in such a scenario, rollback will be performed as a result of the data that was written later than the timestamp of the transaction.

On the contrary, when we want to perform a read operation on data then,

If Timestamp(A) < W(G) which indicates the Timestamp on the transaction is lesser than the time when the data was written last, in such a scenario, rollback will be performed as a result of the data that was written later than the timestamp of the transaction.

And if Timestamp(A) >= W(G) then the operation will be executed because the Timestamp of the transaction is greater than the last time the data was written which means that the serializability will be maintained.

Validation-Based Protocol

This protocol is divided into three phases which are as follows:-

1. Read Phase:- All the data modifications inside the transaction are stored in a local buffer and reused when needed in later operations.

2. Validation Phase:- Validation is performed to ensure that the actual values can replace what is already pre-existent in the buffer.

  • Validation Test: Tests are performed on transactions in concurrency control in DBMS on the basis of their execution time such that Timestamp(A) < Timestamp(B). The set of rules are:-
  1. Start(B) > End(A) – When transaction A is done executing, the start of transaction B is done. In this manner, the serializability is not affected.
  2. Validate(B) > End(A) > Start(B) – It states that the validation is performed after the end of transaction A if A ends executing after the onset of Transaction B, as one of the effective concurrency control techniques in DBMS.

3. Write Phase:- If the validation phase is performed well and good, the values are copied otherwise a rollback is performed.

Conclusion

We got to know what is concurrency control in DBMS and in the presence of what problems, we resort to using concurrency control techniques in DBMS in the first place.

The problems requiring concurrency control in DBMS were Dirty Read Problem, Unrepeatable-Read Problem, Phantom-Read Probel and Loss-Update Problem.

The protocols that we follow such as lock-based protocol, time-based protocol and validation-based protocol can prove to be handy concurrency control techniques in DBMS.

FAQs Related to Concurrency Control in DBMS:

1. What is concurrency control in DBMS?
Ans. To increase throughput and remove waiting time and errors, the measures or techniques applied are known as concurrency control in DBMS.

2. What are protocols in DBMS?
Ans. Protocols can be defined as rules or more precisely, the concurrency control techniques in DBMS to avoid unnecessary errors in transactions.

3. What is serializability relative to concurrency control in DBMS?
Ans. The order of correct execution of operations stands for serializability.

Leave a Reply

Your email address will not be published. Required fields are marked *