The main idea is that every node should store not only the current (last) entry value, but also some number of previous values in order to allow consistent distributed reads. To do this, we need to introduce a separate node role - transaction version coordinators - which will be responsible for assigning a monotonically growing transaction version as well as maintaining versions of in-progress transactions and in-progress reads. The last committed transaction ID , and IDs of pending transactions and define the versions that should be visible for any subsequent read. The IDs of pending reads defines the value versions that should be visible are no longer needed and can be discarded.

Version Coordinator(s)

In the initial version of distributed MVCC we will use single transaction coordinator that will define the global transaction order for all transactions in the cluster. The coordinator may be a dedicated node in the cluster. Upon version coordinator failure a new coordinator should be elected in such a way that the new coordinator will start assigning new versions that is guaranteed to be greater than all previous transaction versions. This can be easily implemented by defining a transaction version as a tuple (TV, LV) where TV is cluster topology version which is set to 1 on the first node and is incremented on each topology change, LV is coordinator local version which is set to 1 when coordinator is elected and is incremented on each transaction version request. When sorted lexicographically, this satisfies the requirements above.

When a transaction write version is requested, version coordinator generates a new version and adds this version to the pending transactions set. When transaction is committed or rolled back, version coordinator removes this version from pending set. When a transactional read is requested, the version coordinator captures the current state of pending transactions in order to determine which versions should not be evicted until read is finished. When a read is finished, an acknowledgement is sent to coordinator. Coordinator periodically broadcasts the last safe to keep version to the whole cluster so that nodes can discard temporarily saved versions.

Internal Data Structures Changes

Transactional Protocol Changes

...

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Version Coordinator(s)

Internal Data Structures Changes

Transactional Protocol Changes

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Version Coordinator(s)

Internal Data Structures Changes

Transactional Protocol Changes