...
- Initially all process are white, sent and received collections are empty, LocalState is empty.
- After some time of system work, every node might have:
- Optionally empty collections sent and received
- Optionally non-empty LocalState <-> sent + received. State match events that changed its state.
- Random process can start a snapshot (furthermore, multiple process may start it simultaneously):
- Node colors itself to red.
- It commits a LocalState.
- It commits sent and received as collections for every IN and OUT channel. New one created for next LocalState.
- It prepares a marker message: it is red, and has a payload of sent. Goal of the marker is to guarantee order of messages (receivedij must be a subset of sentji).
- Mark every ordinal message between distributed processes with the marker message, if no upcoming message to a node, then it just sends the marker as an ordinary message.
- On receiving the ordinal message a process has to check the marker at first, before applying the message;
- If receiving color differs from local color, node has to trigger the local snapshot procedure.
- Handle sent from the received marker:
- calculates ChannelState for the channel it received a message: sent - received; where sent extracts from the marker, received - calculates locally since local snapshot.
- On received marker messages from all IN channels, it prepares a snapshot:
- Local snapshot of node i: Ni = LocalStatei + Σ ChannelStateij (sent - received)
- Every such local snapshot is a unit of global snapshot:
- Note, that snapshot consist of committed LocalStates and messages between nodes.
- committed sent and received collections are cleaned.
...
On receiving a message with new CutVersion node sets it and commits LocalState and ChannelState - to identify wrong order of the events.
- LocalState maps to local WAL (all of committed transactions are part of LocalState);
- Channel:
- We can piggy back on Ignite transaction protocol (Prepare, Finish) messages with CommunicationSpi.
- In case there is no transaction for a channel, we can rely on the DiscoverySpi to start local snapshot on non-participated nodes.
- ChannelState maps to `IgniteTxManager#activeTransactions`:
- sent collection match committed transactions for which local node is near - they send FinishMessages to other nodes.
- received collection match committed transactions for which local node isn't near - they receive FinishMessages from other nodes.
- `IgniteTxManager#activeTransactions` doesn't track:
- committing transactions (COMMITTING+), they are removed from this collection before start committing them.
- track them additionally: add to a separate collection before it starts committing, and remove after it committed.
...
- Initial state:
- Ignite WAL are in consistent state relatively to previous full or incremental snapshot.
- Every Ignite node has local ConsistentCut future equals to
null (node is WHITE)
. - Empty collection committingTxs (Set<GridCacheVersion>) that goal is to track COMMITTING+ transactions, that aren't part of
IgniteTxManager#activeTx
. It's automatically shrinks after transaction committed.
- Ignite node inites a global snapshot, by starting DistributedProcess (by discovery IO):
- creates a new ConsistentCutMarker.
- prepares a marker message that contains the marker and transmits this message to other nodes.
- Every nodes starts a local snapshot process after receiving the marker message (whether by discovery, or by communication with transaction message)
- Atomically: creates new ConsistentCut future (node becomes RED), creates committingTxs, starts signing outgoing messages with the ConsistentCutMarker.
- Write a snapshot record to WAL with the received ConsistentCutMarker (commits LocalState).
- Collect of active transactions - concat of
IgniteTxManager#activeTx
and committingTxs - Prepares 2 empty collections - before[sent - received] andafter[exclude] cut.
- While global Consistent Cut is running every node signs output transaction messages:
- Prepare messages signed with the ConsistentCutMarker (to trigger ConsistentCut on remote node, if not yet).
- Finish messages signed with the ConsistentCutMarker (to trigger...) and transaction ConsistentCutMarker (to notify nodes which side of cut this transaction belongs to).
- Finish messages is signed on node that commits first (near node for 2PC, backup or primary for 1PC).
- For every collected active transaction, node waits for Finish message, to extract the ConsistentCutMarker and fills before, after collections:
- if received marker is null or differs from local, then transaction on before side
- if received color equals to local, then transaction on after side
- After all transactions finished:
- Writes a WAL record with ChannelState (before, after).
- Stops filling committingTxs.
- Completes ConsistentCut future, and notifies a node-initiator about finishing local procedure (with DistributedProcess protocol).
- After all nodes finished ConsistentCut, every node stops signing outgoing transaction messages - ConsistentCut becomes null (node is WHITE again).
...