Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

other fields are obvious.

During scans 'ver' field is checked, if row version is visible (the row was added by current or committed tx) 'xid_max' field of referenced data row is checked - the row considered as visible if it is the last version of row ('xid_max' is NA)

Locks

During DML or SELECT FOR UPDATE tx aquires locks one by one.

...

Since all the changes are written right after lock is aquired, all the participants are ready to commit or rollback changes after each successful data changeupdate, so, we can omit prepare stage.

We can use next rules Next rules are used for tx recovery on TX coordinator participant failure:

  1. On TX coordinator failure the oldest node from tx participants is elected as TX coordinator and checks other tx participants' local states.
  2. If there is at least one participant having tx in ACTIVE or ABORTED state, whole transaction is marked as ABORTED and tx rolled back message is sent to MVCC coordinator.
  3. If there is at least one lost partition from cacheId to partitions mapping, whole transaction is marked as ABORTED and tx rolled back message is sent to MVCC coordinator.
  4. If all participants have tx in LOCALLY_COMMITTED state and there is no lost partitions, whole transaction is marked as COMMITTED and tx committed message is sent to MVCC coordinator.
  5. A record with lost partitions from cacheId to partitions mapping cannot be deleted from TxLog (we cannot cleanup caches with lost partitions).
  6. Rejoining node compares theyr local TxLog and MVCC coordinator's one. 
  7. All partitions from ACTIVE txs have to be forcibly rebalanced. 
  8. All LOCALLY_COMMITTED txs have to be compared to appropriate record from MVCC coordinator's TxLog. 
  9. If there is no matching record, partitions from such txs have to be forcibly rebalanced.
  10. If there is a matching record, tx gets new state from the record.

Read (getAll and SQL changes)

Each read operation outside active transaction creates a special read only transaction and uses its tx snapshot for versions filtering.

Each read operation within active READ_COMMITTED transaction creates a special read only transaction and uses its tx snapshot for versions filtering.

Each read operation within active REPEATABLE_READ transaction uses its tx snapshot for versions filtering.

During get operation the first passing MVCC filter item is returned.

During secondary indexes scans 'ver' field of tree item is checked, if row version is visible (the row was added by current or committed tx) 'xid_max' field of referenced data row is checked - the row considered as visible if it is the last version of row 'xid_max' is NA or ACTIVE or higher than assigned.

During primary indexes scans 'ver' field of tree item is checked, the first passing MVCC filter item is returned, all next versions of row are skipped.

Cleanup of old versions