Page History

...

Code Block

language	bash

# Proposed directory structure
$ ls $IGNITE_HOME
db/
snapshots/
|-- SNP/
|---- db/
|---- incincrements/
|------ SNP_16409844000000000000000001/
|-------- meta.smf
|-------- waldb/
|---------- 0000000000000000.wal.zip
|------ SNP_1640985000binary_meta/
|-------- meta.smf
|-------- binary_metamarshaller/
|-------- 0000000000000000.wal/.zip

Restore process

Code Block

language	bash

// Restore cluster on specific incremental snapshot
$ control.sh --snapshot restore SNP --incremental 1

...

User specifies incremental snapshot name
Parses snapshot name and extracts base and incremental snapshots
Additionally to full snapshot check (already exists in SnapshotRestoreProcess) it checks incremental snapshots:
1. Checks that all WAL segments are presented (from ClusterSnapshotRecord to requested ConsistentCutFinishRecord).
After full snapshot restore processes (prepare, preload, cacheStart) has finished, it starts another DistributedProcess - `walRecoveryProc`:
1. Every node applies WAL segments since base snapshot while not reach requested ConsistentCutFinishRecord.
2. Ignite should forbid concurrent operations (both read and write) for restored cache groups during WAL recovery.
4. TBD Just notify user about it? Set a barrier for operations? Use != OWNING partition state?

...

For Atomic caches it's required to restore data consistency (primary and backup nodes) differently, with ReadRepair feature. Consistent Cut relies on transaction protocol' messages (Prepare, Finish). Atomic caches protocol doesn't have enough messages to sync different nodes.

TBD: Restore process should have suggest user perform an additional step if ATOMIC caches is restored:

Check partitions state with `idle_verify` command;
Start read-repair for non-consistent keys in lazy mode: on user get() operations related to broken cache keys.

...

Page tree

Versions Compared

Old Version 120

New Version 121

Key

Restore process