Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Cut can be consistent and inconsistent. It's prohibited to create a snapshot on inconsistent cut.
  2. Restoring requires read WAL ahead for last incremental snapshot:
    1. There are 2 records in WAL for every consistent cut: ConsistentCutStartRecord IncrementalSnapshotStartRecord and ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
    2. ConsistentCutFinishRecordIncrementalSnapshotFinishRecordcontains info which transactions before ConsistentCutStartRecord IncrementalSnapshotStartRecord has to be excluded from consistent stateincremental snapshot.
    3. Then it's important to read WAL ahead, reach ConsistentCutFinishRecordIncrementalSnapshotFinishRecord and only after that apply entries since previous Consistent CutIncremental Snapshot.
  3. In some circumstances it's impossible to create Consistent Cut Incremental Snapshots anymore, full snapshot should be created (see below, limitations in Phase 1).
  4. Only one instance of Consistent Cut (and then incremental snapshot) Incremental Snapshot can be created in one moment, concurrent process are not allowed.

...

Code Block
languagebash
// Create incremental snapshot.
// SNP - name of pre-existing full snapshot.
// [--retries N] - amount of attempts to create incremental snapshot, in case of inconsistent cut. Default is 3.
$ control.sh --snapshot create SNP --incremental [ --retries N ]
^ -- Incremental snapshot SNP_1640984400 created at 2022-01-01T00:00:00.

Under the hood this command:

Under the hood this command:

  1. Makes checks:
    1. Base snapshot (at least
    Makes checks:
    1. Base snapshot (at least its metafile) exists. Exists metafile for all incremental snapshots.
    2. Validate that no misses in WAL segments since previous snapshot. SnapshotMetadata should contain info about last WAL segment that contains snapshot data:
      1. If snapshot is fullClusterSnapshotRecord must be written with rolloverType=NEXT_SEGMENT. A previous segment number is written to WAL, segment number of this record is stored within existing structure SnapshotMetadata.
      2. If snapshot is incremental: stored segment number is a segment that contains ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
    3. Check that new ConsistentCutVersion is greater than version of previous snapshots.Check that baseline baseline topology is the same (relatively to base snapshot).
    4. Check that WAL is consistent (there was no disabling WAL since previous snapshot) - this info is stored into MetaStorage.
  2. Starts a new Consistent Cut.
  3. On finish Consistent Cut:
    1. if cut is consistent: log ConsistentCutFinishRecordIncrementalSnapshotFinishRecord with rolloverType=CURRENT_SEGMENT  to enforce archiving the segment after logging the record.
    2. if cut is inconsistent: skip log ConsistentCutFinishRecord IncrementalSnapshotFinishRecord and retry since 1.
    3. fail if retry attempts are exceeded.
  4. Awaits the segment with ConsistentCutFinishRecord IncrementalSnapshotFinishRecord has been archived and compacted.
  5. Collects WAL segments since next of the last segment of previous snapshot (`prev + 1`) until segment that contains ConsistentCutFinishRecord for current incremental snapshot (from previous snapshot to IncrementalSnapshotFinishRecord).
  6. Creates hardlinks to the compressed segments into target directory.
  7. Writes a meta files with description of the new incremental snapshot:
    1. meta.smf: 
      1. ConsistentCutVersion.
      2. WAL segments participated in this snapshotPointer to IncrementalSnapshotFinishRecord.
    2. binary_meta, marshaller_data if it changed since previous snapshot.
Code Block
languagebash
# Proposed directory structure
$ ls $IGNITE_HOME
db/
snapshots/
|-- SNP/
|---- db/
|---- incincrements/
|------ SNP_16409844000000000000000001/
|-------- metanode0.smf
|-------- waldb/
|---------- 0000000000000000.wal.zipbinary_meta/
|------ SNP_1640985000/
|-------- meta.smfmarshaller/
|-------- binary_metawals/
|---------- 0000000000000000.wal/.zip

Restore process

Code Block
languagebash
// Restore cluster on specific incremental snapshot
$ control.sh --snapshot restore SNP_1640984400 --increment 1

With control.sh --snapshot restore  command:

  1. User specifies incremental full snapshot name
  2. Parses snapshot name and extracts base and incremental snapshotsAdditionally to
  3. After full snapshot check (already exists in SnapshotRestoreProcess) it checks incremental snapshots:
    1. For every incremental snapshot: extract segments list from the meta file and checks that WAL segments are presented.
    2. Order incremental snapshots by ConsistentCutVersion (from the meta file), and checks no misses of WAL segments since base snapshot.
    3. On reducer it checks that all nodes have the same ConsistentCutVersion on all nodes for specified incremental snapshot.
    After full snapshot restore processes (prepare, preload, cacheStart) has finished, it starts another DistributedProcess - `walRecoveryProc`:
  4. Reducer sends common ConsistentCutVersion to all nodes;
  5. Every node applies WAL segments since base snapshot while not reach ConsistentCutFinishRecord for specified ConsistentCutVersion.
  6. Ignite should forbid concurrent operations (both read and write) for restored cache groups during WAL recovery.
    1. TBD Just notify user about it? Set a barrier for operations? Use != OWNING partition state?
  7. Process of data applying for snapshot cache groups (from base snapshot) is similar to GridCacheDatabaseSharedManager logical restore:
  8. disable WAL for specified cache group
  9. find `ClusterSnapshotRecord` related to the base snapshot
  10. starts discretely (cut by cut) applying WAL updates with striped executor (cacheGrpId, partId).
  11. enable WAL for restored cached groups
  12. force checkpoint and checking restore state (checkpoint status, etc).
  13. update local ConsistentCutVersion (in ConsistentCutManager) with restored one.restore processes (prepare, preload, cacheStart) has finished, it starts another DistributedProcess - `walRecoveryProc`:
    1. Every node applies WAL segments since base snapshot while not reach requested IncrementalSnapshotFinishRecord.
    2. Ignite should forbid concurrent operations (both read and write) for restored cache groups during WAL recovery.
    3. Process of data applying for snapshot cache groups (from base snapshot) is similar to GridCacheDatabaseSharedManager logical restore:
      1. disable WAL for specified cache group
      2. find `ClusterSnapshotRecord` related to the base snapshot
      3. starts applying WAL updates with striped executor (cacheGrpId, partId). Apply filter for versions in ConsistentCutFinishRecord.
      4. enable WAL for restored cached groups
      5. force checkpoint and checking restore state (checkpoint status, etc).

Checking snapshot

Code Block
languagebash
// Check specific incremental snapshot
$ control.sh --snapshot check SNP --increment 1

With control.sh --snapshot check  command:

Check includes following steps on every baseline node:

  1. Check snapshot files are consistent:
    1. Snapshot structure is valid and metadata matches actual snapshot files
    2. all WAL segments are presented (from ClusterSnapshotRecord to requested IncrementalSnapshotFinishRecord).
  2. Check snapshot incremental snapshot data integrity:
    1. It parses WAL segments from the first incremental snapshot to the specified one (with --increment param).
    2. For every partition it calculates hashes for entries, and for entry versions.
      1. On the reduce phase it compares partitions hashes between primary and backup copies.
    3. For every pair of nodes that participated as primary nodes it calculates hash of committed transactions. For example:
      1. There are two transactions:
        1. TX1, and there are 2 nodes that participates in it as primary nodes: A and B
        2. TX2, and there are 2 nodes: A and C
      2. On node A it prepares 2 collections: TxHashAB = [hash(TX1)], TxHashAC = [hash(TX2)]
      3. On node B it prepares 1 collection: TxHashBA = [hash(TX1)]
      4. On node C it prepares 1 collection: TxHashCA = [hash(TX2)]
      5. On the reduce phase of the check it compares collections from all nodes and expects that:
        1. TxHashAB equals TxHashBA
        2. TxHashAC equals TxHashCA

Note that incremental snapshot doesn't check data of related full snapshot. Then full check of snapshot will consist of two steps:

  1. Check full snapshot
  2. Check incremental snapshot

Atomic caches

For Atomic caches it's required to restore data consistency (primary and backup nodes) differently, with ReadRepair feature. Consistent Cut relies on transaction protocol' messages (Prepare, Finish). Atomic caches protocol doesn't have enough messages to sync different nodes.

TBD: Restore process should have suggest user perform an additional step if ATOMIC caches is restored:

  1. Check partitions state with `idle_verify` command;
  2. Start read-repair for non-consistent keys in lazy mode: on user get() operations related to broken cache keys.

...