ID	IEP-109
Author	Nikolay Izhikov
Sponsor	Maksim Timonin
Created	17.07.2023
Status	DRAFT

Motivation

IEP-43 introduces persistent caches snapshots feature. This feature highly adopted by Ignite users.

In-memory caches snapshots will simplify the following use-cases:

In-memory cluster restarts.
Version upgrade.
Disaster recovery.
DC/Hardware replacement.
Data motion.

Description

in-memory snapshots will reuse existing snapshot code when possible. So key design decisions stay the same. It assumed that reader knows and understand IEP-43. So design description will focus on difference on persistence and in-memory snapshot.

API

New optional flag --include-in-memory will be added.
By default, flag disabled to preserve existing behavior.

Example:

create snapshot example

> ./control.sh --snapshot --create SNP --include-in-memory

Creation

1. PME guarantees to make sure of visibility of all dirty pages:

Only PME required for in-memory snapshots. We can set write listener during PME because no concurrent transactions allowed.

See:

PartitionsExchangeAware#onDoneBeforeTopologyUnlock
IgniteSnapshotManager#onDoneBeforeTopologyUnlock
SnapshotFutureTask#start

2. Storage unit:

In-memory caches store pages in configured DataRegion. Page for specific cache group allocated in some Segment of data region.
Moreover, pages of data region can store entries from different cache-groups.
So for in-memory caches snapshots it proposed to use per entry storage.
Every entry in cache group will be saved as is to corresponding file.

3. Persistent pages contains CRC to make sure data integrity while storing on disk:

CRC for each page must be calculated and written to snapshot metadata during snapshotting.

CRC must be checked during restore.

4. Metadata:

StoredCacheData.
binary_meta.
marshaller.

must be properly prepared and saved during snapshot.

High-level snapshot creation algorithm steps:

Under PME locks setup page before writeLock listener.
Listener must be invoked under writeLock of page but before code that trying to acquire lock.
Iterate each page of DataRegion and write page entries to corresponding cache-group snapshot files.
Inside listener:
1. store pageId that was handled by listener
2. handle page like on step2.

After algorithm ends each file will contain consistent copy of in-memory cache group.

Restore

Prerequisites:

Restored cache group not exists on cluster.
Count of nodes in cluster are the same as in time of creation (this restriction can be eliminated in Phase 2 of IEP implementation).
All nodes in cluster has snapshot locally.

High-level snapshot restore algorithm steps:

Create corresponding caches.
Restore all saved metadata.
Iterate saved entries and put them as local cache entries.
Wait all nodes complete step 2 and 3.
Start caches that belongs to restored data region.

Rejected alternatives

There are a couple of alternatives to implement backup copies of inmemory caches that was rejected during initial analyzes:

Store full data region copy
The idea of this approach is to store full data region copy on the disk.
1. Pros:
  1. Mostly sequential writes.
2. Cons:
  1. restore must setup all Ignite internals strustured regarding restored offheap data.
    Such code can be difficult to implement and maintain.
On demand persistence
The idea of this approach is to reuse PDS infrastucture and persistent snapshot code by introducing new cache mode "PDS_ON_DEMAND".
This mode will use persitence code but with WAL and checkpoint disabled. So on creating snapshot regular checkpoint will be performed.
After checkpoint PDS files are ready to be copied to snapshot folder.
1. Pros:
  1. Code reuse.
2. Cons:
  1. Additional configuration on user-side required (set new mode for DataRegion).
  2. All Ignite codebase needs to be aware of new mode - baseline, PDS, WAL, checkpoint, etc.
  3. PDS page stores additional data - storage overhead.
shmem usage
The idea of this approach is to use shared memory feature of Linux.
1. Pros:
  1. Easy to implement (questioneable).
  2. Same approach used by other vendors to implement in-memory cluster restarts.
2. Cons:
  1. OS specific.
  2. Only for certain scenarios. Doesn't cover all use-cases.
Use "dump" name for feature instead of snapshot.
Alternative naming for feature can be used.
Essentially, consistently saving each entry to the file are not in-memory specific.
Feature can be used for persistent cache with a little amount of effort.
So, as in other database (pg_dump), dump name can be used for a feature.

Risks and Assumptions

DataRegionConfiguration#persistenceEnabled=false for in-memory caches by the definition.
The way to restore snapshot on different topology must be further investigated.
Compaction of snapshot must be further investigated.
Consistent per-entry snapshot that will be implemented in this IEP can be created for persistence caches also.
API to iterate, transform, print snapshot content is out of scope and must be created in other IEP.

Phases scopes

Phase 1
- snapshot creation.
- restore snapshot on the same topology.
- control utility integration.
- metrics, system views integration.
Phase 2
- restore snapshot on different topology.
Phase 3
- snapshot compactification.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

Tickets

key	summary	type	updated	assignee	customfield_12311032	customfield_12311037	customfield_12311022	customfield_12311027	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-109 Cluster in-memory caches snapshots