...
Picture bellow illustrates steps of the algorithm on single node:
- Initial state:
- No concurrent ConsistentCut process is running.
lastFinishedCutId
holds previous ConsistentCutId
, or null.
- User starts a command for creating new incremental snapshot:
- Ignite node inits a DistributedProcess with discovery message
SnapshotOperationRequest
that holds new ConsistentCutId
(goal is to notify every node in a cluster about running incremental snapshot). - DistributedProcess store the topology version topVer on which ConsistentCut started.
- Process of creation of consistent cut can be started by two events (what will happen earlier):
- Receive the
SnapshotOperationRequest#ConsistentCutId
by DiscoverySPI (by the DistributedProcess). - Receive the
ConsistentCutAwareMessage#ConsistentCutId
by CommunicationSPI (by transaction messages - Prepare, Finish).
- On receiving the
ConsistentCutId
it starts local ConsistentCut:
Checks whether - There are 2 roles that node might play:
- ROLE#1 - wraps outgoing messages - for all Ignite nodes: client, baseline, non-baseline server nodes.
- ROLE#2 - prepares data to be written in WAL - for baseline nodes only.
- Before start check:
- Whether ConsistentCut has already started (
ConsistentCut
!= null) or finished (lastFinishedCutId
== id) for this id, skip if it has. - On non-baseline nodes
- prepares data to be written to WALAfter ConsistentCut future finish, DistributedProcess automatically notifies a node- In case ConsistentCut is inited by CommunicationSPI then compare the
ConsistentCutAwareMessage#topVer
with local node order:- Local node order equals to new topVer on the moment when node joined to a cluster.
- If the order is higher than ConsistentCut topVer it means the node joined after ConsistentCut started. Skip start ConsistentCut on this node.
- ROLE#1:
- creates new
ConsistentCut
future. - If node is non-baseline (client, non-baseline servers) - complete it right after creation, and notify a node-initiator about local procedure has finished (by DistributedProcess protocol).
- While
ConsistentCut != null
wraps outgoing messages to ConsistentCutAwareMessage
. It contains info:
ConsistentCutId
(to start ConsistentCut on remote node, if not yet).- Messages contain additional field txCutId. It is originally set on the nodes that commit first:
- For 2PC it is an originated node.
- For 1PC it is a backup node.
- If
txCutId
equals to null then transaction starts committing Before
Consistent Cut started, otherwise After
.
- On receive
ConsistentCutAwareMessage
that makes transaction committed (FinishRequest for 2PC, PrepareResponse for 1PC) sets tx#txCutId = message#txCutId
.
- ROLE#2 - for baseline nodes only:
- In the message thread atomically inits ConsistentCut:
- creates new
ConsistentCut
future. - creates empty collection
removedActiveTxs
(This collection doesn't remove transactions unlike IgniteTxManager#activeTx
does).
- In the background thread:
- Writes a
ConsistentCutStartRecord
to WAL with the received ConsistentCutId
. - Creates a copy (weakly-consistent) of
IgniteTxManager#activeTx
. Set listeners on those tx#finishFuture
.- For optimization it's safely exclude transactions that
tx#status == ACTIVE
. It's guaranteed that such transactions belongs After side.
- Creates a copy of
removedActiveTxs
(contains transactions that are might be cleaned from IgniteTxManager#activeTx
). Set listeners on those tx#finishFuture
. - Set
removedActiveTxs
to null. We don't care of txs concurrently added to removedActiveTxs
, they just don't land into "before" or "after" set and will be excluded from recovery.
- In transaction threads fills
removedActiveTxs
if ConsistentCut != null
and removedActiveTxs != null
:- Every transaction is added into
removedActiveTxs
right before it is removed from IgniteTxManager#activeTx
.
- For every listening transaction, the callback is called when transaction finished:
- check If transaction state is UNKNOWN or status is RECOVERY_FINISH, then complete ConsistentCut with exception.
- If transaction mapped to a higher topology version than ConsistentCut topVer, then put it into after.
- if
tx#txCutId
equals to local, then put transaction into after, otherwise put into before.
- After every listening transaction finished:
- Writes a
ConsistentCutFinishRecord
into WAL with the collections ( before, after ). - Completes
ConsistentCut
future.
- On non-baseline nodes (clients, non-baseline servers):
- In case ConsistentCut is inited by CommunicationSPI then compare the
ConsistentCutAwareMessage#topVer
with local node order: - Local node order equals to new topVer on the moment when node joined to a cluster.
- If the order is higher than ConsistentCut topVer it means the node joined after ConsistentCut started. Skip start ConsistentCut on this node.
- In the message thread creates completed
ConsistentCut
future.
- On all nodes (clients, non-baseline, baseline) - wraps outgoing messages:
- While
ConsistentCut != null
wraps outgoing messages to ConsistentCutAwareMessage
. It contains info:
ConsistentCutId
(to start ConsistentCut on remote node, if not yet).- Some messages sets additional field txCutId. It set on the node that commits first:
- For 2PC it is an originated node.
- For 1PC it is a backup node.
- If txCutId set to null then transaction starts committing
Before
Consistent Cut started, otherwise After
.
- For receiving
ConsistentCutAwareMessage
that make transaction committed (FinishRequest for 2PC, PrepareResponse for 1PC) Ignite set tx#txCutId = message#txCutId
before it starts handling the received message.
- Notify a node-initiator about local procedure has finished (by DistributedProcess protocol).
- After all nodes finished ConsistentCut, on every node:
- Updates lastFinishedCutId with the current id.
ConsistentCut
future becomes null.- Stops signing outgoing transaction messages.
- Node initiator checks that every node completes correctly.
- If any node complete exceptionally - complete Incremental Snapshot with exception.
Consistent and inconsistent Cuts
...
Code Block |
---|
language | java |
---|
title | ConsistentCutAwareMessage |
---|
|
class ConsistentCutAwareMessage {
/** Original transaction message. */
Message msg;
/** Consistent Cut ID. */
UUID cutId;
/** Consistent Cut ID after which transaction committed. */
@Nullable UUID txCutId;
/** Cluster topology version on which Consistent Cut started. */
long topVer;
} |
Transaction
A new field added to IgniteInternalTx
Code Block |
---|
language | java |
---|
title | IgniteInternalTx |
---|
|
class IgniteInternalTx {
/**
* @param ID of {@link ConsistentCut} AFTER which this transaction was committed, {@code null} if transaction
* committed BEFORE.
*/
public void cutId(@Nullable UUID id);
} |
Consistent Cut Classes
Code Block |
---|
language | java |
---|
title | ConsistentCutManager |
---|
|
// Class is responsible for managing all stuff related to Consistent Cut. It's an entrypoint for transaction threads to check running consistent cut.
class ConsistentCutManager extends GridCacheSharedManagerAdapter {
// Current Consistent Cut. All transactions threads wraps outgoing messages if this field is not null. */
volatile @Nullable ConsistentCut cut;
// Entrypoint for handling received new Consistent Cut ID.
void handleConsistentCutId(UUID id);
} |
Code Block |
---|
language | java |
---|
title | ConsistentCut |
---|
|
class ConsistentCut extends GridFutureAdapter<WALPointer> {
Set<GridCacheVersion> beforeCut;
Set<GridCacheVersion> afterCut;
Set<IgniteInternalFuture<IgniteInternalTx>> removedActive;
} |
Links
- ON DISTRIBUTED SNAPSHOTS, Ten H. LAI and Tao H. YANG, 29 May 1987