Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A deadlock is a cycle of transactions waiting for locks to be released. A necessary condition for a deadlock happen is "Hold-and-Wait" situation. In other words, a transaction must wait for a lock while it is holding another lock. By definition of the record-level transaction in AsterixDB, a record is the only one resource to be locked during a transaction life-cycle. Based on the record-level transaction and given that there is no lock upgrade on any resource in any situation in AsterixDB, it is possible for a transaction to avoid the "Hold-and-Wait" situation. Therefore, deadlock-free locking protocol is achievable in AsterixDB

In traditional transaction processing systems, a transaction can be processed by multiple threads. These threads processing a transaction marks their operations using the transaction id. For example, when a lock is requested by each of these threads, the thread provides the transaction id as well as the resource id to be locked. Based on a wait-for-graph, (where a vertex represents a transaction id and a directed edge from a vertex, T1, to another vertex, T2, is added if T1 is waiting for a resource held by T2), a lock manager can detect a deadlock if the wait-for-graph includes a cycle in it.

A transactor is a thread who works for processing transactions. In a typical transaction processing system, it is a natural choice to make a transactor serve for a single transaction until the transaction is completed (instead of making it serve for multiple transactions in an interleaved manner). 

In contrast, considering record-level transactions in AsterixDB, it is natural choice to make a transactor serve for multiple record-level transactions in an interleaved manner due to the following behavior in AsterixDB.

Records to be inserted into a primary index and zero or more secondary indexes are conveyed in a frame. The frame reaches first to the primary index operator, where records, i.e., <pk,record> pairs in the frame are inserted into the primary index. Subsequently, the frame flows to a first secondary index operator and <sk,pk> pairs from the frame are inserted into the first secondary index. Then, it goes to next secondary index operator, and so on. After the frame goes through all index operators, the frame reaches a commit operator. The commit operator creates a commit log for each transaction/record in the frame. After the commit logs are flushed to disk, locks held by the corresponding transactions are released. 

(Based on this behavior, to make a transactor serve for a transaction (in a non-interleaved manner) requires to deliver only a single record in a frame. However, this is not as efficient as the batch way described above. )

Now, the design choice (making a transactor serve for multiple record-level transactions in an interleaved manner) becomes an issue in terms of deadlock detection.

For example, suppose the following case.
A transactor a1 waits for a lock on a record r1 for transaction t1 while holding a lock on a record r2 for transaction t2.
A transactor a2 waits for a lock on a record r2 for transaction t3 while holding a lock on a record r1 for transaction t4.

If transaction ids are used in a wait-for-graph as shown in "In Theory" figure, there is no deadlock. However, if transactor ids are used in a wait-for-graph as shown in "In Reality" figure, there is a deadlock in the graph. This is due to the fact that if a transactor waits, all transactions served by the transactor also waits. Unless a system can provide as many threads as the number of records to be operated in the system at any point in time, this situation is not easily avoidable. 

In order to overcome this issue, we came up with a simple approach to achieve dead-lock free locking protocol in AsterixDB. The intuition is as follows:

As long as a transactor can avoid "Hold-and-Wait" situation, there will be no deadlock. Therefore, a transactor may either hold locks without waiting, or wait without any locks. 

The above strategy can be implemented in the following manner in AsterixDB considering the behavior of push-based records operation through a job pipeline described above.

 

1. for each entry in a frame
2. returnValue = tryLock() for an entry
3. if returnValue == false
    3-1. flush all entries for which locks are already acquired to the next operator
           this will make all those entries reach the commit operator such that corresponding commit logs will be created.
    3-2. create a WAIT log and wait until logFlusher thread has flushed the log and gives notification
           this notification guarantees that all locks acquired by this transactor were released. 
    3-3. acquire a lock for the failed entry using lock() instead of tryLock()
           this lock() call will not cause a deadlock since the transactor doesn't hold any lock at this moment.
4. create an update log and insert the entry