Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. When a tx is started a new version is assighned and MVCC coordinator adds a local  TxLog record with XID and ACTIVE flag
  2. the first change request to a datanode within the transaction produces a local TxLog record with XID and ACTIVE flag at the data node.
  3. at the commit stage each tx node adds a local TxLog record with XID and LOCALLY_COMMITTED and PREPARED flag and sends an acknowledge to TX coordinator
  4. TX coordinator sends to MVCC coordinator node a tx committed message.
  5. MVCC coordinator adds TxLog record with XID and COMMITTED flag, all the changes become visible.
  6. TX coordinator sends MVCC coordinator sends to participants a commit acknowledged message, nodes asyncronously mark tx all tx datanodes mark tx as COMMITTED, all resources are released.

Note: since commit acknowledge is processed asynchronously, tx which is not active in tx snapshot but at PREPARED state in local TxLog (during read operation) is considered and marked as COMMITTED.

An error during commit

  1. When a tx is started a new version is assighned and MVCC coordinator adds a local  TxLog record with XID and ACTIVE flag
  2. the first change request to a datanode within the transaction produces a local TxLog record with XID and ACTIVE flag at the data node.
  3. at the commit stage each tx node adds a local TxLog record with XID and LOCALLY_COMMITTED flag and PREPARED flag and sends an acknowledge to TX coordinator 
  4. In case at least one participant does not confirm commit, TX coordinator sends to each participant rollback message.
  5. each tx node adds a local TxLog record with XID and ABORTED flag and sends an acknowledge to TX coordinator, all the locks became released.
  6. TX coordinator sends to MVCC coordinator node a tx rolled back message.
  7. MVCC coordinator adds TxLog record with XID and ABORTED flag.
  8. MVCC coordinator sends to participants a rollback acknowledged message, all resources are released.

Rollback

  1. When a tx is started a new version is assighned and MVCC coordinator adds a local  TxLog record with XID and ACTIVE flag
  2. the first change request to a datanode within the transaction produces a local TxLog record with XID and ACTIVE flag at the data node.
  3. at the rollback stage each tx node adds a local TxLog record with XID and ABORTED flag and sends an acknowledge to TX coordinator
  4. TX coordinator sends to MVCC coordinator node a tx rolled back message.
  5. MVCC coordinator adds TxLog record with XID and ABORTED flag.

In case MVCC coordinator fails, newly assighned coordinator gets all TxLog subsets and checks that all LOCALLY_COMMITTED tx participants (at least one node from each primary to backup partition mapping) are alive and mark TX as COMMITTED (and sends to participants a commit acknowledged message, nodes asyncronously mark tx as COMMITTED) or LOCALLY_COMMITTED otherwise.

...

  1. .
  2. MVCC coordinator sends to participants a rollback acknowledged message, all resources are released.

Recovery Protocol Changes

There are several participant roles:

  • MVCC coordinator
  • TX coorinator
  • Primary data node
  • Backup datanode

Each participant may have several roles at the same time.

So, there are steps to recover each type of participant:

On MVCC coordinator failure:

  1. A new coordinator is elected (the oldest server node, may be some additional filters)
  2. During exchange each node sends its TxLog
  3. The new coordinator merges all the TxLog chunks and checks all local states for each TX. In case data nodes have state conflicts next rules are used:
    1. if there is at least one node with TX in ABORTED state tx rollback message is send to all datanodes and whole TX is marked as ABORTED.
    2. if there is at least one node with

...

    1. TX in COMMITTED

...

    1. state whole TX is marked as COMMITTED and commit acknowledged message is send to all datanodes.
    2. if all datanodes have TX in PREPARED state whole TX is marked as COMMITTED

...

LOCALLY_COMMITTED txs are in active state for all other participants, they retain all aquired locks and in case there are no quorum (one of the members failed permanently) has to be resolved manually.

Recovery Protocol Changes

Since all the changes are written right after lock is aquired, all the participants are ready to commit or rollback changes after each successful data update, so, we can omit prepare stage.

Next rules are used for tx recovery on participant failure:

...

    1. and commit acknowledged message is send to all datanodes.
    2. TX cannot be in COMMITTED and ABORTED state at the same time on different nodes. In case a node cannot mark PREPARED tx as COMMITTED this node has to be forcibly stopped.
  1. After merge is done it continues versions requests processing.

On TX coordinator failure:

  1. A new coordinator is elected (an oldest server tx datanode it becames a Tx coordinator)
    1. in case the oldest server tx datanode has already finished the tx and released resources, nothing happens (that means that MVCC coordinator whether started acknowleging or failed and tx will be recovered during MVCC coordinator recovery)
  2. A new coordinator checks other nodes:
    1. In case at least one nodes has already finished the tx and released resources it does nothing (that means that MVCC coordinator whether started acknowleging or failed and tx will be recovered during MVCC coordinator recovery)
  3. In case the new coordinator has the transaction in ACTIVE state
    1. A tx rollback message is send to all tx data nodes.
    2. A tx rolled back message is send to MVCC coordinator node.
    3. MVCC coordinator sends to participants a rollback acknowledged message, all resources are released.
  4. In case the new coordinator has the transaction in COMMITTED state
    1. A tx committed message is send to MVCC coordinator node.
    2. MVCC coordinator sends to participants a commit acknowledged message, all tx datanodes mark tx as COMMITTED, all resources are released.
  5. In case the new coordinator has the transaction in ABORTED state:
    1. A tx rollback message is send to all tx data nodes.
    2. A tx rolled back message is send to MVCC coordinator node.
    3. MVCC coordinator sends to participants a rollback acknowledged message, all resources are released.
  6. In case the new coordinator has the transaction in PREPARED state
    1. A new coordinator checks all tx data nodes
    2. In case all participants have the tx in PREPARED state or at least one tx data node has the tx in COMMITTED state:
      1. A tx committed message is send to MVCC coordinator node.
      2. MVCC coordinator sends to participants a commit acknowledged message, all tx datanodes mark tx as COMMITTED, all resources are released.
    3. In case at least one datanode has tx in ABORTED state:
      1. A tx rollback message is send to all tx data nodes.
      2. A tx rolled back message is send to MVCC coordinator node.
      3. MVCC coordinator sends to participants a rollback acknowledged message, all resources are released.

On primary data node failure

On loss of partition

On tx datanode rejoin

...

Read (getAll and SQL)

Each read operation outside active transaction creates a special read only transaction and uses its tx snapshot for versions filtering.

...