Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-109
Author
Sponsor
Created17.07.2023
Status

Status
titleDRAFT


Table of Contents

Motivation

IEP-43 introduces persistent caches snapshots feature. This feature highly adopted by Ignite users.
But Ignite snapshot supports persistent caches only.

Cache dumps is essentially a file that contains all entries of cache group on the time of dump creation.
Meta information of dumped caches and binary included in dump, also. 
Dump is consistent, which means all entries that existed in cluster at the moment of dump creation will lands into dump file.

Ignite must provide dump restore feature. This process will create cache group and caches from saved config and put all save entries in it. This will essentially restore cache state to the moment of dump creation. 

Dump feature can be used both, for persistent and in-memory cache groups. Use-cases areIn-memory caches snapshots will simplify the following use-cases:

  • In-memory cluster restarts.
  • Version upgrade.
  • Disaster recovery.
  • DC/Hardware replacement.
  • Data motion.

Description

in-memory snapshots will reuse existing snapshot code when possible. So key design decisions stay the same. It assumed that reader knows and understand IEP-43. So design description will focus on difference on persistence and in-memory snapshot.

API

API

New command for dump creation and restore New optional flag --include-in-memory will be added.By default, flag disabled to preserve existing behavior.

Example:

Code Block
languagebash
firstline1
titlecreate snapshot example
> ./control.sh --snapshotdump --create SNP DUMP --cache-group 123456790

> ./control.sh --includedump -in-memory

Creation

1. PME guarantees to make sure of visibility of all dirty pages:

Only PME required for in-memory snapshots. We can set write listener during PME because no concurrent transactions allowed.

See:

  • PartitionsExchangeAware#onDoneBeforeTopologyUnlock
  • IgniteSnapshotManager#onDoneBeforeTopologyUnlock
  • SnapshotFutureTask#start

2. Storage unit:

In-memory caches store pages in configured DataRegion. Page for specific cache group allocated in some Segment of data region.
Moreover, pages of data region can store entries from different cache-groups.
So for in-memory caches snapshots it proposed to use per entry storage.
Every entry in cache group will be saved as is to corresponding file.

3. Persistent pages contains CRC to make sure data integrity while storing on disk:

CRC for each page must be calculated and written to snapshot metadata during snapshotting.

CRC must be checked during restore. 

4. Metadata:

  • StoredCacheData.
  • binary_meta.
  • marshaller.

must be properly prepared and saved during snapshot.

High-level snapshot creation algorithm steps:

-restore DUMP --cache-group 123456790,097654321

Creation

Dump creation flow is similar to persistent snapshot creation. So you may want to check IEP-43 for snapshot design details.

  1. On receiving request of dump creation Ignite will start distributed process.
  2. PME wil be started on receiving of InitMessage .
  3. Under PME locks setup entry before change Under PME locks setup page before writeLock listener.
    Listener must be invoked under writeLock of page but before code that trying to acquire lockbefore actually changing entry.
  4. Save cache group metadata.
  5. Save binary and marshaller metadata.
  6. Iterate each page CacheDataTree of DataRegion and write page entries to corresponding cache-group snapshot filescache group and handle each entry.
    Each entry must be sent to specific cache group handler that will write it to corresponding file.
  7. Inside entry listener:
    1. store pageId entry key that was handled by listener.
    2. handle page entry like on step2step 6.

After algorithm ends each file will contain consistent copy of in-memory cache group.

Details:

  1. Consistency guarantees - to make sure there are no concurrent update dump creation starts under PME locks.
  2. Persistent pages contains CRC to make sure data integrity while storing on disk.
    Dump data integrity must be protected by CRC, also.
  3. Metadata must be properly prepared and saved during dump creation:
    • StoredCacheData.
    • binary_meta.
    • marshaller.

Restore

Prerequisites:

  • Restored cache group not exists on cluster.
  • Count of nodes in cluster are the same as in time of creation (this restriction can be eliminated in Phase 2 of IEP implementation).
  • All nodes in cluster has snapshot dump locally.

High-level

...

dump restore algorithm steps:

  1. Create corresponding caches.
  2. Restore all saved metadata.
  3. Iterate saved entries and put them as local cache entries.
  4. Wait all nodes complete step 2 and 3.
  5. Start caches that belongs to restored data region.

Rejected alternatives

There are a couple of alternatives to implement backup copies of inmemory caches that was rejected during initial analyzes:

  1. Store full data region copy 
    The idea of this approach is to store full data region copy on the disk.
    1. Pros:
      1. Mostly sequential writes.
    2. Cons:
      1. restore must setup all Ignite internals strustured regarding restored offheap data.
        Such code can be difficult to implement and maintain.
  2. On demand persistence
    The idea of this approach is to reuse PDS infrastucture and persistent snapshot code by introducing new cache mode "PDS_ON_DEMAND".
    This mode will use persitence code but with WAL and checkpoint disabled. So on creating snapshot regular checkpoint will be performed.
    After checkpoint PDS files are ready to be copied to snapshot folder.
    1. Pros:
      1. Code reuse.
    2. Cons:
      1. Additional configuration on user-side required (set new mode for DataRegion).
      2. All Ignite codebase needs to be aware of new mode - baseline, PDS, WAL, checkpoint, etc.
      3. PDS page stores additional data - storage overhead.
  3. shmem usage
    The idea of this approach is to use shared memory feature of Linux.
    1. Pros:
      1. Easy to implement (questioneable).
      2. Same approach used by other vendors to implement in-memory cluster restarts.
    2. Cons:
      1. OS specific.
      2. Only for certain scenarios. Doesn't cover all use-cases.
  4. Use "dumpsnapshot" name for feature instead of snapshot.
    Alternative naming for feature can be used.
    Essentially, consistently saving each entry to the file are not in-memory specific.
    Feature can be used for persistent cache with a little amount of effort.
    So, as in other database (pg_ dump), dump name can be used for a feature.

Risks and Assumptions

  • DataRegionConfiguration#persistenceEnabled=false  for in-memory caches by the definition.
  • The way to restore snapshot on different topology must be further investigated.
  • Compaction of snapshot must be further investigated.
  • Consistent per-entry snapshot that will be implemented in this IEP can be created for persistence caches also.API to iterate, transform, print snapshot content is out of scope and must be created in other IEP.

Phases scopes

  • Phase 1
    • snapshot dump creation.
    • restore snapshot dump on the same topology.
    • control utility integration.
    • metrics, system views integration.
    • new SecurityPermission.
    • dumping as ZIP file.
  • Phase 2
    • restore snapshot on different topology.
  • Phase 3
    • snapshot compactification.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

Tickets

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,updated,assignee,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columnskey,summary,type,updated,assignee,Priority,Priority,Priority,Priority,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-109) order by key
serverId5aa69414-a9e9-3523-82ec-879b028fb15b