...
- Initial state:
- No concurrent ConsistentCut process is running.
lastFinishedCutId
holds previous ConsistentCutId
, or null.
- User starts a command for creating new incremental snapshot:
- Ignite node inits a DistributedProcess with discovery message
SnapshotOperationRequest
that holds new ConsistentCutId
(goal is to notify every node in a cluster about running incremental snapshot). - DistributedProcess store the topology version topVer on which ConsistentCut started.
- Process of creation of consistent cut can be started by two events (what will happen earlier):
- Receive the
SnapshotOperationRequest#ConsistentCutId
by DiscoverySPI (by the DistributedProcess). - Receive the
ConsistentCutAwareMessage#ConsistentCutId
by CommunicationSPI (by transaction messages - Prepare, Finish).
- On receiving the
ConsistentCutId
it starts local ConsistentCut: - There are 2 roles that node might play:
- ROLE#1 - wraps outgoing messages - for all Ignite nodes: client, baseline, non-baseline server nodes.
- ROLE#2 - prepares data to be written in WAL - for baseline nodes only.
- Before start check:
- Whether ConsistentCut has already started (
ConsistentCut
!= null) or finished (lastFinishedCutId
== id) for this id, skip if it has. - On non-baseline nodes In case ConsistentCut is inited by CommunicationSPI then compare the
ConsistentCutAwareMessage#topVer
with local node order:- Local node order equals to new topVer on the moment when node joined to a cluster.
- If the order is higher than ConsistentCut topVer it means the node joined after ConsistentCut started. Skip start ConsistentCut on this node.
- ROLE#1:
- creates new
ConsistentCut
future. - If node is non-baseline (client, non-baseline servers) - complete it right after creation, and notify a node-initiator about local procedure has finished (by DistributedProcess protocol).
- While
ConsistentCut != null
wraps outgoing messages to ConsistentCutAwareMessage
. It contains info:
ConsistentCutId
(to start ConsistentCut on remote node, if not yet).- Messages contain additional field txCutId. It is originally set on the nodes that commit first:
- For 2PC it is an originated node.
- For 1PC it is a backup node.
- If
txCutId
equals to null then transaction starts committing Before
Consistent Cut started, otherwise After
.
- On receive
ConsistentCutAwareMessage
that makes transaction committed (FinishRequest for 2PC, PrepareResponse for 1PC) sets tx#txCutId = message#txCutId
.
- ROLE#2 - for baseline nodes only:
- In the message thread atomically inits ConsistentCut:
- creates new
ConsistentCut
future. - creates empty collection
removedActiveTxs
(This collection doesn't remove transactions unlike IgniteTxManager#activeTx
does).
- In the background thread:
- Writes a
ConsistentCutStartRecord
to WAL with the received ConsistentCutId
. - Creates a copy (weakly-consistent) of
IgniteTxManager#activeTx
. Set listeners on those tx#finishFuture
.- For optimization it's safely exclude transactions that
tx#status == ACTIVE
. It's guaranteed that such transactions belongs After side.
- Creates a copy of
removedActiveTxs
(contains transactions that are might be cleaned from IgniteTxManager#activeTx
). Set listeners on those tx#finishFuture
. - Set
removedActiveTxs
to null. We don't care of txs concurrently added to removedActiveTxs
, they just don't land into "before" or "after" set and will be excluded from recovery.
- In transaction threads fills
removedActiveTxs
if ConsistentCut != null
and removedActiveTxs != null
:- Every transaction is added into
removedActiveTxs
right before it is removed from IgniteTxManager#activeTx
.
- For every listening transaction, the callback is called when transaction finished:
- check If transaction state is UNKNOWN or status is RECOVERY_FINISH, then complete ConsistentCut with exception.
- If transaction mapped to a higher topology version than ConsistentCut topVer, then put it into after.
- if
tx#txCutId
equals to local, then put transaction into after, otherwise put into before.
- After every listening transaction finished:
- Writes a
ConsistentCutFinishRecord
into WAL with the collections ( before, after ). - Completes
ConsistentCut
future.
- Notify a node-initiator about local procedure has finished (by DistributedProcess protocol).
- After all nodes finished ConsistentCut, on every node:
- Updates lastFinishedCutId with the current id.
ConsistentCut
future becomes null.- Stops signing outgoing transaction messages.
- Node initiator checks that every node completes correctly.
- If any node complete exceptionally - complete Incremental Snapshot with exception.
...
{"serverDuration": 156, "requestCorrelationId": "9c7f0f170e211d4a"}