...
- Cut can be consistent and inconsistent. It's prohibited to create a snapshot on inconsistent cut.
- Restoring requires read WAL ahead for last incremental snapshot:
- There are 2 records in WAL for every consistent cut: ConsistentCutStartRecord IncrementalSnapshotStartRecord and ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
- ConsistentCutFinishRecordIncrementalSnapshotFinishRecordcontains info which transactions before ConsistentCutStartRecord IncrementalSnapshotStartRecord has to be excluded from consistent stateincremental snapshot.
- Then it's important to read WAL ahead, reach ConsistentCutFinishRecordIncrementalSnapshotFinishRecord and only after that apply entries since previous Consistent CutIncremental Snapshot.
- In some circumstances it's impossible to create Consistent Cut Incremental Snapshots anymore, full snapshot should be created (see below, limitations in Phase 1).
- Only one instance of Consistent Cut (and then incremental snapshot) Incremental Snapshot can be created in one moment, concurrent process are not allowed.
...
Code Block |
---|
|
// Create incremental snapshot.
// SNP - name of pre-existing full snapshot.
// [--retries N] - amount of attempts to create incremental snapshot, in case of inconsistent cut. Default is 3.
$ control.sh --snapshot create SNP --incremental [ --retries N ]
^ -- Incremental snapshot SNP_1640984400 created at 2022-01-01T00:00:00.
|
Under the hood this command:
- Makes checks:
- Base snapshot (at least its metafile) exists. Exists metafile for all incremental snapshots.
- Validate that no misses in WAL segments since previous snapshot. SnapshotMetadata should contain info about last WAL segment that contains snapshot data:
- If snapshot is full: ClusterSnapshotRecord is written to WAL, segment number of this record is stored within existing structure SnapshotMetadata.
- If snapshot is incremental: stored segment number is a segment that contains ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
- Check that baseline topology is the same (relatively to base snapshot).
- Check that WAL is consistent (there was no disabling WAL since previous snapshot) - this info is stored into MetaStorage.
- Starts a new Consistent Cut.
- On finish Consistent Cut:
- if cut is consistent: log ConsistentCutFinishRecordIncrementalSnapshotFinishRecord with
rolloverType=CURRENT_SEGMENT
to enforce archiving the segment after logging the record. - if cut is inconsistent: skip log ConsistentCutFinishRecord IncrementalSnapshotFinishRecord and retry since 1.
- fail if retry attempts are exceeded.
- Awaits the segment with ConsistentCutFinishRecord IncrementalSnapshotFinishRecord has been archived and compacted.
- Collects WAL segments for current incremental snapshot (from previous snapshot to ConsistentCutFinishRecord IncrementalSnapshotFinishRecord).
- Creates hardlinks to the compressed segments into target directory.
- Writes a meta files with description of the new incremental snapshot:
- meta.smf:
- Pointer to ConsistentCutFinishRecord IncrementalSnapshotFinishRecord.
- binary_meta, marshaller_data if it changed since previous snapshot.
Code Block |
---|
|
# Proposed directory structure
$ ls $IGNITE_HOME
db/
snapshots/
|-- SNP/
|---- db/
|---- increments/
|------ 0000000000000001/
|-------- metanode0.smf
|-------- db/
|---------- binary_meta/
|---------- marshaller/
|-------- wals/
|---------- 0000000000000000.wal.zip |
...
Code Block |
---|
|
// Restore cluster on specific incremental snapshot
$ control.sh --snapshot restore SNP --incrementalincrement 1 |
With control.sh --snapshot restore
command:
- User specifies incremental full snapshot name
- Parses snapshot name and extracts base and incremental snapshots
- Additionally to full snapshot check (already exists in SnapshotRestoreProcess) it checks incremental snapshots:
- Checks that all WAL segments are presented (from ClusterSnapshotRecord to requested ConsistentCutFinishRecordIncrementalSnapshotFinishRecord).
- After full snapshot restore processes (prepare, preload, cacheStart) has finished, it starts another DistributedProcess - `walRecoveryProc`:
- Every node applies WAL segments since base snapshot while not reach requested ConsistentCutFinishRecordIncrementalSnapshotFinishRecord.
- Ignite should forbid concurrent operations (both read and write) for restored cache groups during WAL recovery.
- Process of data applying for snapshot cache groups (from base snapshot) is similar to
GridCacheDatabaseSharedManager
logical restore:- disable WAL for specified cache group
- find `ClusterSnapshotRecord` related to the base snapshot
- starts applying WAL updates with striped executor (cacheGrpId, partId). Apply filter for versions in ConsistentCutFinishRecord.
- enable WAL for restored cached groups
- force checkpoint and checking restore state (checkpoint status, etc).
...