In this article, we will learn about the handling of deadlock in distributed systems. If you’re new to this, don't worry, we will be covering what are deadlocks and how to handle them in a detailed manner along with diagrams.
What is deadlock?
Deadlock is known as a state of database management system having more than two transactions when each of the transactions is waiting for the data that is in return locked by other transactions by the database system. The deadlock can be illustrated by a cycle as shown in the diagram below in the form of a wait-for-graph.
This is a graph that is directed and has vertices that are used for denoting a transaction whereas the edges are used for denoting the wait time of the graph for all the data present in the database.
In order to understand this better, let's take an example. From the diagram shown below, the wait-for-graph is illustrated where the transaction T1 is waiting for the data X which in return is locked by transaction T3. but the transaction T3 is waiting for the Y which in return is locked by T2 but the transaction T2 is waiting for Z which in return is locked by transaction T1. therefore, if we look at it, a waiting cycle is formed which leads to waiting time, and hence none of the transactions can proceed with the execution.
How is deadlock handled in a distributed system?
In a distributed database system, the transaction process is also distributed which means that the same transaction might be under process for more than a single site. The two important deadlock handling concerns that there is distributed system and not a centralized system are the location of the transaction and the control of the transaction.
When these concerns are addressed or resolved properly then the deadlock can be handled by using any deadlock prevention technique, deadlock avoidance as well as deadlock detection and its removal. Keys are used to avoid or overcome the deadlock scenario.
Transaction location of the distributed system
We all know that transactions are processed in multiple sites in a distributed database system and these data items are then used in various sites. The amount of data that is processed is not distributed uniformly across the distributed system in those sites.
This means that the time period required for processing also varies and hence the same transaction might be available or active for some sites but might be inactive for other sites. In case when two transactions that are conflicting are located on the same site, there is a possibility of one of the sites being in an inactive state.
This condition, however, does not arise in a centralized system. This issue is known as a transaction location issue in the decentralized database system.
In order to resolve this issue, a daisy chain model is used. We know that, in this model, when a transaction is executed, it carries certain details with it when moved from one site to another.
A few of the details are, the required table list, required site list, required list of visited tables as well as the sites, the list of tables and sites that are to be visited, the list of locks that are acquired along with their types. Once the transaction is either terminated by commit or aborted, it should inform all the relevant sites regarding it.
Transactional control is used for designing as well as controlling the sites that are required for processing the transaction in the distributed system. There are various options available to choose as to where to process the transaction and to design the control center for instance:
For the center of control, any one server can be chosen
The center of control that is chosen can travel from one server to another.
However, the responsibility for controlling can be shared between several servers in the network
Distributed deadlock prevention
This is similar to deadlock prevention in a distributed system, wherein, before starting to execute, the transaction should acquire all the locks in order to prevent deadlock in the distributed database system.
In this, the site where the transaction starts is known as the controlling site. This controlling site is used for sending the message to the sites where the data is located for locking the data items. It then waits for confirmation from the sites. When the site confirms that all the data items are locked the transaction starts executing.
In case of any site or communication link failure, the transaction is in wait mode until all the links is being repaired.
Having said that, even though the implementation is simple, there are some drawbacks related to this approach:
There can be a communication delay due to the pre-acquisition of locks which increases the time that is required to fulfill the transaction.
In case of site or communication link failure, the transaction will have to wait for a longer time in order for that site to recover. In the running sites, however, the data items are locked. This will then prevent the other transactions from executing.
The controlling sites cannot communicate with other sites in case of failure. These controlling sites will then keep all the data items locked in the locked state thereby resulting in the blocking of the transaction.
Distributed deadlock avoidance
In a centralized system, distributed deadlock avoidance is used for handling the deadlock even before it occurs. The translation location as well as control issues of the transaction are to be looked into and resolved. Since the translation has distributed nature, the following issues might occur:
The conflict between two transactions can happen on the same site
The conflict between two transactions can happen on a different site
When the conflict occurs, a transaction can either be aborted or can be allowed to wait according to the distributed wound-wait or wait-die algorithm.
In order to understand this better, let’s take an example. Let’s assume T1 and T2 to be two transactions that arrive at site P and try locking the data item which is locked already by T2 on the same site. This results in conflict at site P. following are the algorithms used:
Distributed wait-wait- here, if the transaction T1 is greater than T2 then the transaction T2 needs to be aborted. If at site P, the transaction T2 is active, the site P aborts the transaction T2 and rolls it back for broadcasting this message to other relevant sites. In case the transaction T2 has left site P but is active on another site let’s say Q then, in that case, all the T2 broadcast at site P will be aborted. The other site J then aborts and rolls back the translation T2 and sends this message to all the sites out there.
In case T1 is lesser than T2 or younger then the transaction T1 is allowed to wait. The transaction T1 can resume its execution only after site P receives the message that the translation T2 has completed its processing.
Distributed wound-die- In this case, when T1 is greater than or older than T2 then the transaction T1 is allowed to wait. The transaction T1 can resume its execution only after site P receives the message that the transaction T2 has either committed or aborted successfully on all the sites. On the other way round, if the transaction T1 is lesser or younger than the transaction T2 then in this case, the T1 is aborted.
At site P, the concurrency control is used for sending the message to all the sites where the transaction T1 has taken place in order to abort it from the respective sites. From the controlling site, a notification is sent to the user from all the sites where the translation T1 has successfully aborted.
Distributed deadlock detection
The deadlocks are allowed to be invoked and removed if found similar to that of centralized deadlock detection. In this, no check is performed by the system when a lock request is placed by the transaction. The global wait-for-graphs are created for the implementation of this approach. In global wait-for-graphs, the deadlock is indicated if a cycle exists in the system.
It is, however, difficult to spot the deadlock since, in the network, the transaction waits for the resources. The timers can be used by a deadlock detection system wherein each transaction is associated with a timer, that is set to a definite period and the transaction is expected to be completed within that duration. The timer goes off if the transaction doesn’t get completed within the given duration that in turn indicates a deadlock.
Another tool that is used for handling the deadlock is a deadlock detector. There is one deadlock detector in a centralized system whereas there can be more than one deadlock detector in a distributed system. The deadlock detector is used for finding the deadlocks under its control for respective sites.
Following are the three alternatives for the detection of deadlock in a distributed system:
Centralized deadlock detector: in this, a single site is designated as the central detector of deadlock
Hierarchical deadlock detector: in this, a number of deadlock detectors are arranged hierarchically
Distributed deadlock detector: in this, all the sites involved in removing and detecting the deadlock
This was all about Deadlock Handling in Distributed Database. I hope you liked this article, do let us know if there is any feedback.
Top comments (0)