...
- Initial state:
- Ignite nodes started from snapshot, or other consistent state (after graceful cluster stop / deactivation).
- var
color = WHITE
. - Empty collection committingTxs (Set<GridCacheVersion>) that goal is to track COMMITTING+ transactions, that aren't part of
IgniteTxManager#activeTx
. It's automatically shrinks after transaction committed.
- After some time, Ignite nodes might have non-empty committingTxs.
- Ignite node inites a global snapshot, by starting DistributedProcess (by discovery IO).
- Every nodes starts a local snapshot process after receiving a mark message (whether by discovery, or by communication with transaction message)
- Atomically: set
color = RED
and disable auto-shrink of committingTxs. - write a snapshot record to WAL (commits LocalState).
- Collect of active transactions - concat of
IgniteTxManager#activeTx
and committingTxs
- While receiving Finish messages from other nodes, node fills ChannelState: exclude and (sent - received) collections.
- After all transactions finished, it writes a WAL record with ChannelState.
- New color is sent with transaction Finish messages.
- Committing node add an additional color field for FinishMessage, that shows whether to include transaction to snapshot, or not.
- Other nodes on receiving a marker with new color starts a local snapshot (see steps from point 3).
- Notifies a node-initiator about finishing local procedure (with DistributedProcess protocol).
- For all nodes
color = RED
. Next snapshot iteration will started with changing color to WHITE
.
Use node-local GridCacheVersion as mark
To avoid using mark message (color field) we can try rely on fixed GridCacheVersion. Algorithm is as follows:
- Initial state:
- Ignite nodes started from snapshot, or other consistent state (after graceful cluster stop / deactivation).
- Ignite node inites a global snapshot, by starting DistributedProcess (by discovery IO).
- Every (incl. client and non-baseline) node starts a local snapshot process after receiving a message from DistributedProcess.
- Phase 1:
- Write WAL record - commit LocalState.
- fix
snpVersion
= GridCacheVersion#next(topVer)
- Collect all active transactions originated by near node, which nearXidVersion is less than
snpVersion
- Note, continue collecting transactions that are less than
snpVersion
, and for which local node is near (to exclude them later) - after finishing all collected transactions: notify Ignite nodes with
snpVersion
(with DistributedProcess protocol).
- After all nodes finished first phase, they received Map<UUID, GridCacheVersion> from other nodes.
- Phase 2: Only server baseline nodes continue work there:
- Collect all active transactions, find their near node (by GridCacheVersion#nodeOrderId), filter them with known GridCacheVersion
- Await all such transaction completed.
- Write WAL record with the received map.
- Phase 3:
- Stop collecting near transactions that are less than local
snpVersion
and send them to other nodes. - On receiving such map, write a new WAL record again, that contains additional skip collection.
- After finishing Phase 3, process of snapshot is finished.
Restoring process:
- Find WAL records for Phase 2, 3 - find map GridCacheVersions transactions to filter, and additional transactions xids to exclude (from Phase 3).
- Apply all records with the filters until record from Phase 2.
Disadvantages:
- Increments of GridCacheVersions is CAS operations from different threads. But the version is assigned to a transaction in non-atomic way. No guarantee that
snpVersion
is greater than version of transaction created after fixing snpVersion. Ignite should track such transactions:- With fair locking while creating and assigning version to transaction - possible performance degradation.
- With additional filter after preparing a snapshot (4.d).
- Client and non-baseline nodes has a job to do: collecting of transactions, awaiting them finished, sending a response. It could be non-reliable, as client nodes can be short-lived:
- Also should handle special cases when transaction is committed after client node gone and there is no info about it actual version.
- No safe previous record to restore, if some incremental snapshots created. Need to filter all history.
Consistent and inconsistent Cuts
...