ID	IEP-109
Author	Nikolay Izhikov
Sponsor	Maksim Timonin
Created	17.07.2023
Status	DRAFT

Motivation

IEP-43 introduces persistent caches snapshots feature. This feature highly adopted by Ignite users.

In-memory caches snapshots will simplify the following use-cases:

In-memory cluster restarts.
Version upgrade.
Disaster recovery.
DC/Hardware replacement.
Data motion.

Description

in-memory snapshots will resuse existing snapshot code when possible. So key design decisions stay the same. It assumed that reader knows and understand IEP-43. So design description will focus on difference on persistence and in-memory snapshot.

API

New optional flag --mode will be added.
Possible values are PERSISTENT, ALL.
Default value is PERSISTENT to keep existing behaviour of command.

Example:

create snapshot example

> ./control.sh --snapshot --create SNP --mode ALL

Creation

1. PME guarantees to make sure of visibility of all dirty pages:

Only PME required for in-memory snapshots. We can set write listener during PME because no concurrent transactions allowed.

See:

PartitionsExchangeAware#onDoneBeforeTopologyUnlock
IgniteSnapshotManager#onDoneBeforeTopologyUnlock
SnapshotFutureTask#start

2. Storage unit:

In-memory caches store pages in configured DataRegion. Page for specific cache group allocated in some Segment of data region.

So, unlike persistent caches it more convinient and error-prone to create snapshot of the DataRegion with all caches in it.

During creation of snapshot node must track all page changes which can be implemented by the listener of write locks in PageMemoryNoStoreImpl.

3. Persistent pages contains CRC to make sure data integrity while storing on disk:

CRC for each page must be calculated and written to snapshot metadata during snapshotting.

CRC must be checked during restore.

4. Metadata:

StoredCacheData.
binary_meta.
marshaller.

must be properly prepared and saved during snapshot.

Restore

Prerequisites:

Restored data region is empty - there are no any caches stored in it.
Count of nodes in cluster are the same as in time of creation (this restriction can be eliminated in Phase 2 of IEP implementation).
All nodes in cluster has snapshot locally.

Steps:

Block data region exclusively on each node - any attempt of usage (cache creation) must be blocked.
Restore all saved data into data region.
Restore all saved metadata.
Wait all nodes complete step 2 and 3.
Start caches that belongs to restored data region.

Rejected alternatives

There are a couple of alternatives to implement backup copies of inmemory caches that was rejected during initial analyzes:

Store entries instead of data region
The idea of this approach is to store entries in the file instead of pages.
1. Pros:
  1. cache group granularity like in persistent snapshots.
  2. smaller snapshot size in case of snapshotting specific cache group. Currently, cache group snapshot granularity not supported by persistent snapshots.
  3. backward compatibility of BinaryObject only required. PageIO strusture can be changed
  4. ability to implement primary-only mode.
2. Cons:
  1. restore require more time because per-entry local put operation must be invoked on each node.
On demand persistence
The idea of this approach is to reuse PDS infrastucture and persistent snapshot code by introducing new cache mode "PDS_ON_DEMAND".
This mode will use persitence code but with WAL and checkpoint disabled. So on creating snapshot regular checkpoint will be performed.
After checkpoint PDS files are ready to be copied to snapshot folder.
1. Pros:
  1. Code reuse.
2. Cons:
  1. Additional configuration on user-side required (set new mode for DataRegion).
  2. All Ignite codebase needs to be aware of new mode - baseline, PDS, WAL, checkpoint, etc.
  3. PDS page stores additional data - storage overhead.
shmem usage
The idea of this approach is to use shared memory feature of Linux.
1. Pros:
  1. Easy to implement (questioneable).
  2. Same approach used by other vendors to implement in-memory cluster restarts.
2. Cons:
  1. OS specific.
  2. Only for certain scenarios. Doesn't cover all use-cases.

Risks and Assumptions

DataRegionConfiguration#persistenceEnabled=false for in-memory caches by the definition.
The same value must be for DataRegionConfiguration when cache group restored from in-memory snapshot.
After this feature implemented PageIO will require to be backward compatible.
The way to restore snapshot on different topology must be further investigated.
Empty pages of DataRegion will be written to snapshot.
Compaction of snapshot must be further investigated.

Phases scopes

Phase 1
- snapshot creation.
- restore snapshot on the same topology.
- control utility integration.
Phase 2
- restore snapshot on different topology.
Phase 3
- snapshot compactification.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

Tickets

key	summary	type	updated	assignee	customfield_12311032	customfield_12311037	customfield_12311022	customfield_12311027	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-109 Cluster in-memory caches snapshots