...
To be able to determine a state of TX, which has changed Tx that created a particular row, a special structure (TxLog) is introduced. TxLog is a table (can be persistent in case persistence enabled) which contains MVCC version to transaction state mappings.
TxLog is used to keep all the data consistent on cluster crush and recovery as well:
...
Code Block | ||
---|---|---|
| ||
| key part | | | |-----------------------------| lockVer | link | | cache_id | hash | mvccVer | | | |
...
mvccVer - MVCC version of transaction which has created the row
lockVer - MVCC version of transaction which holds a lock on the row
...
Code Block | ||
---|---|---|
| ||
| key part | |----------------| | link | mvccVer | |
...
link - link to the data
mvccVer - XID of transaction who created the row
...
Code Block | ||
---|---|---|
| ||
| | | | | | | | | | | payload size | next_link | mvccVer | newMvccVer | cache_id | key_bytes | value_bytes | row_version | expire_time | | | | | | | | | | | |
mvccVer - TX id which created this row.
newMvccVer - TX id which updated this row or NA in this is the last row version (need to decide whether the row is visible for current reader).
...
Near Tx node has to to notify Version Coordinator about final Tx state to make changes visible for subsequent reads.
When MVCC coordinator node fails, a new one is elected among the live nodes – usually the oldest one.
The main goal of the MVCC coordinator failover is to restore an internal state of the previous coordinator in the new one. The internal state of MVCC coordinator consists of two main parts:
Due to Ignite partition map exchange design all write transactions should be finished before topology version is changed. Therefore there is no need to restore active transactions list on the new coordinator because all old transactions are either committed or rolled back during topology changing.
The only thing we have to do – is to recover the active queries list. We need this list to avoid old versions cleanup when there are any old queries are running over this old data because it could lead to query result inconsistency. When all old queries are done we can safely continue cleanup old versions.
To restore active queries at the new coordinator the MvccQueryTracker object was introduced. Each tracker is associated with a single query. The purpose of the tracker is:
Active queries list recovery on the new coordinator looks as follows:
Each read operation outside an active transaction or in scope of an optimistic transaction gets or uses a previously received Query Snapshot (which considered as read version for optimistic Tx. Note: optimistic transactions cannot be used in scope of DML operations).
...
After all rows are processed, corresponding TxLog records can be deleted as well.
Related documents:
View file name 2017.mvcc.vldb.pdf height 250 View file name concurrency-distributed-databases.pdf height 250 View file name p209-yu.pdf height 250 View file name rethink-mvcc.pdf height 250
Related threads:
Suggestion to improve deadlock detection