Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Cut can be consistent and inconsistent. It's prohibited to create a snapshot on inconsistent cut.
  2. Restoring requires read WAL ahead for last incremental snapshot:
    1. There are 2 records in WAL for every consistent cut: ConsistentCutStartRecord IncrementalSnapshotStartRecord and ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
    2. ConsistentCutFinishRecordIncrementalSnapshotFinishRecordcontains info which transactions before ConsistentCutStartRecord IncrementalSnapshotStartRecord has to be excluded from consistent stateincremental snapshot.
    3. Then it's important to read WAL ahead, reach ConsistentCutFinishRecordIncrementalSnapshotFinishRecord and only after that apply entries since previous Consistent CutIncremental Snapshot.
  3. In some circumstances it's impossible to create Consistent Cut Incremental Snapshots anymore, full snapshot should be created (see below, limitations in Phase 1).
  4. Only one instance of Consistent Cut (and then incremental snapshot) Incremental Snapshot can be created in one moment, concurrent process are not allowed.

...

Code Block
languagebash
// Create incremental snapshot.
// SNP - name of pre-existing full snapshot.
// [--retries N] - amount of attempts to create incremental snapshot, in case of inconsistent cut. Default is 3.
$ control.sh --snapshot create SNP --incremental [ --retries N ]
^ -- Incremental snapshot SNP_1640984400 created at 2022-01-01T00:00:00.

Under the hood this command:

  1. Makes checks:
    1. Base snapshot (at least its metafile) exists. Exists metafile for all incremental snapshots.
    2. Validate that no misses in WAL segments since previous snapshot. SnapshotMetadata should contain info about last WAL segment that contains snapshot data:
      1. If snapshot is fullClusterSnapshotRecord is written to WAL, segment number of this record is stored within existing structure SnapshotMetadata.
      2. If snapshot is incremental: stored segment number is a segment that contains ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
    3. Check that baseline topology is the same (relatively to base snapshot).
    4. Check that WAL is consistent (there was no disabling WAL since previous snapshot) - this info is stored into MetaStorage.
  2. Starts a new Consistent Cut.
  3. On finish Consistent Cut:
    1. if cut is consistent: log ConsistentCutFinishRecordIncrementalSnapshotFinishRecord with rolloverType=CURRENT_SEGMENT  to enforce archiving the segment after logging the record.
    2. if cut is inconsistent: skip log ConsistentCutFinishRecord IncrementalSnapshotFinishRecord and retry since 1.
    3. fail if retry attempts are exceeded.
  4. Awaits the segment with ConsistentCutFinishRecord IncrementalSnapshotFinishRecord has been archived and compacted.
  5. Collects WAL segments for current incremental snapshot (from previous snapshot to ConsistentCutFinishRecord IncrementalSnapshotFinishRecord).
  6. Creates hardlinks to the compressed segments into target directory.
  7. Writes a meta files with description of the new incremental snapshot:
    1. meta.smf: 
      1. Pointer to ConsistentCutFinishRecord IncrementalSnapshotFinishRecord.
    2. binary_meta, marshaller_data if it changed since previous snapshot.
Code Block
languagebash
# Proposed directory structure
$ ls $IGNITE_HOME
db/
snapshots/
|-- SNP/
|---- db/
|---- increments/
|------ 0000000000000001/
|-------- metanode0.smf
|-------- db/
|---------- binary_meta/
|---------- marshaller/
|-------- wals/
|---------- 0000000000000000.wal.zip

...

Code Block
languagebash
// Restore cluster on specific incremental snapshot
$ control.sh --snapshot restore SNP --incrementalincrement 1

With control.sh --snapshot restore  command:

  1. User specifies incremental full snapshot name
  2. Parses snapshot name and extracts base and incremental snapshots
  3. Additionally to full snapshot check (already exists in SnapshotRestoreProcess) it checks incremental snapshots:
    1. Checks that all WAL segments are presented (from ClusterSnapshotRecord to requested ConsistentCutFinishRecordIncrementalSnapshotFinishRecord).
  4. After full snapshot restore processes (prepare, preload, cacheStart) has finished, it starts another DistributedProcess - `walRecoveryProc`:
    1. Every node applies WAL segments since base snapshot while not reach requested ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
    2. Ignite should forbid concurrent operations (both read and write) for restored cache groups during WAL recovery.
    3. Process of data applying for snapshot cache groups (from base snapshot) is similar to GridCacheDatabaseSharedManager logical restore:
      1. disable WAL for specified cache group
      2. find `ClusterSnapshotRecord` related to the base snapshot
      3. starts applying WAL updates with striped executor (cacheGrpId, partId). Apply filter for versions in ConsistentCutFinishRecord.
      4. enable WAL for restored cached groups
      5. force checkpoint and checking restore state (checkpoint status, etc).

...