Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Initial state:
    1. Ignite nodes started from snapshot, or other consistent state (after graceful cluster stop / deactivation).
    2. var color = WHITE .
    3. Empty collection committingTxs (Set<GridCacheVersion>) that goal is to track COMMITTING+ transactions, that aren't part of IgniteTxManager#activeTx . It's automatically shrinks after transaction committed.
  2. After some time, Ignite nodes might have non-empty committingTxs.
  3. Ignite node inites a global snapshot, by starting DistributedProcess (by discovery IO).
  4. Every nodes starts a local snapshot process after receiving a mark message (whether by discovery, or by communication with transaction message) 
    1. Atomically: set color = RED and disable auto-shrink of committingTxs.
    2. write a snapshot record to WAL (commits LocalState).
    3. Collect of active transactions - concat of IgniteTxManager#activeTx and committingTxs 
    4. While receiving Finish messages from other nodes, node fills ChannelState: exclude and (sent - received) collections. 
    5. After all transactions finished, it writes a WAL record with ChannelState.
  5. New color is sent with transaction Finish messages.
  6. Committing node add an additional color field for FinishMessage, that shows whether to include transaction to snapshot, or not. 
  7. Other nodes on receiving a marker with new color starts a local snapshot (see steps from point 3).
  8. Notifies a node-initiator about finishing local procedure (with DistributedProcess protocol).
  9. For all nodes color = RED . Next snapshot iteration will started with changing color to WHITE . 

Use node-local GridCacheVersion as mark

To avoid using mark message (color field) we can try rely on fixed GridCacheVersion. Algorithm is as follows:

  1. Initial state:
    1. Ignite nodes started from snapshot, or other consistent state (after graceful cluster stop / deactivation).
  2. Ignite node inites a global snapshot, by starting DistributedProcess (by discovery IO).
  3. Every (incl. client and non-baseline) node starts a local snapshot process after receiving a message from DistributedProcess. 
  4. Phase 1: 
    1. Write WAL record - commit LocalState.
    2. fix snpVersion  = GridCacheVersion#next(topVer)
    3. Collect all active transactions originated by near node, which nearXidVersion is less than snpVersion 
    4. Note, continue collecting transactions that are less than snpVersion , and for which local node is near (to exclude them later)
    5. after finishing all collected transactions: notify Ignite nodes with snpVersion (with DistributedProcess protocol).
  5. After all nodes finished first phase, they received Map<UUID, GridCacheVersion> from other nodes. 
  6. Phase 2: Only server baseline nodes continue work there:
    1. Collect all active transactions, find their near node (by GridCacheVersion#nodeOrderId), filter them with known GridCacheVersion
    2. Await all such transaction completed.
    3. Write WAL record with the received map.
  7. Phase 3: 
    1. Stop collecting near transactions that are less than local snpVersion and send them to other nodes.
    2. On receiving such map, write a new WAL record again, that contains additional skip collection.
  8. After finishing Phase 3, process of snapshot is finished.

Restoring process:

  1. Find WAL records for Phase 2, 3 - find map GridCacheVersions transactions to filter, and additional transactions xids to exclude (from Phase 3).
  2. Apply all records with the filters until record from Phase 2.


Disadvantages:

  1. Increments of GridCacheVersions is CAS operations from different threads. But the version is assigned to a transaction in non-atomic way. No guarantee that snpVersion  is greater than version of transaction created after fixing snpVersion. Ignite should track such transactions:
    1. With fair locking while creating and assigning version to transaction - possible performance degradation.
    2. With additional filter after preparing a snapshot (4.d).
  2. Client and non-baseline nodes has a job to do: collecting of transactions, awaiting them finished, sending a response. It could be non-reliable, as client nodes can be short-lived:
    1. Also should handle special cases when transaction is committed after client node gone and there is no info about it actual version.
  3. No safe previous record to restore, if some incremental snapshots created. Need to filter all history.

Consistent and inconsistent Cuts

...