Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-109
Author
Sponsor
Created17.07.2023
Status

Status
titleDRAFT


Table of Contents

Motivation

IEP-43 introduces persistent caches snapshots feature. This feature highly adopted by Ignite users.
But Ignite snapshot supports persistent caches only.

Cache dumps is essentially a file that contains all entries of cache group on the time of dump creation.
Meta information of dumped caches and binarymeta included in dump, also. 
Dump is consistent, which means all entries that existed in cluster at the moment of dump creation will lands into dump file.

Ignite must provide dump restore feature. This process will create cache group and caches from saved config and put all save entries in it. This will essentially restore cache state to the moment of dump creation. 

Dump feature can be used both, for persistent and in-memory cache groups. Use-cases areIn-memory caches snapshots will simplify the following use-cases:

  • In-memory cluster restarts.
  • Version upgrade.
  • Disaster recovery.
  • DC/Hardware replacement.
  • Data motion.

Description

in-memory snapshots will resuse existing snapshot code when possible. So key design decisions stay the same. It assumed that reader knows and understand IEP-43. So design description will focus on difference on persistence and in-memory snapshot.

API

New optional flag --mode will be added.
Possible values are PERSISTENT, INMEMORY, BOTH.
Default value is PERSISTENT to keep existing behaviour of command.
When concurrent creation of in-memory and persistent snapshots will be implemented default value can be changed to BOTHcommand for dump creation and restore will be added.

Example:

Code Block
languagebash
firstline1
titlecreate snapshot example
> ./control.sh --snapshotdump --create SNP --mode INMEMORY

Creation

1. PME guarantees to make sure of visibility of all dirty pages:

Only PME required for in-memory snapshots. We can set write listener during PME because no concurrent transactions allowed.

See:

  • PartitionsExchangeAware#onDoneBeforeTopologyUnlock
  • IgniteSnapshotManager#onDoneBeforeTopologyUnlock
  • SnapshotFutureTask#start

2. Storage unit:

In-memory caches store pages in configured DataRegion. Page for specific cache group allocated in some Segment of data region.

So, unlike persistent caches it more convinient and error-prone to create snapshot of the DataRegion with all caches in it.

During creation of snapshot node must track all page changes which can be implemented by the listener of write locks in PageMemoryNoStoreImpl.

...

 DUMP --cache-group 123456790

> ./control.sh --dump --restore DUMP --cache-group 123456790,097654321

Creation

Dump creation flow is similar to persistent snapshot creation. So you may want to check IEP-43 for snapshot design details.

  1. On receiving request of dump creation Ignite will start distributed process.
  2. PME wil be started on receiving of InitMessage .
  3. Under PME locks setup entry before change listener.
    Listener must be invoked before actually changing entry.
  4. Save cache group metadata.
  5. Save binary and marshaller metadata.
  6. Iterate CacheDataTree of cache group and handle each entry.
    Each entry must be sent to specific cache group handler that will write it to corresponding file.
  7. Inside entry listener:
    1. store entry key that was handled by listener.
    2. handle entry like on step 6.

After algorithm ends each file will contain consistent copy of cache group.

Details:

  1. Consistency guarantees - to make sure there are no concurrent update dump creation starts under PME locks.
  2. Persistent pages contains CRC to make sure data integrity while storing on disk

...

  1. .
    Dump data integrity must be protected by CRC, also.
  2. Metadata must be properly prepared and saved during dump creation

CRC for each page must be calculated and written to snapshot metadata during snapshotting.

CRC must be checked during restore. 

...

  1. :
    • StoredCacheData.
    • binary_meta.
    • marshaller
  • .

...

    • .

Restore

Prerequisites:

  • Restored data region is empty - there are no any caches stored in itcache group not exists on cluster.
  • Count of nodes in cluster are the same as in time of creation (this restriction can be eliminated in Phase 2 of IEP implementation).
  • All nodes in cluster has snapshot dump locally.

Steps:

High-level dump restore algorithm steps:

  1. Create corresponding cachesBlock data region exclusively on each node - any attempt of usage (cache creation) must be blocked.
  2. Restore all saved data into data regionmetadata.Restore all saved metadata
  3. Iterate saved entries and put them as local cache entries.
  4. Wait all nodes complete step 2 and 3.
  5. Start caches that belongs to restored data region.

Rejected alternatives

There are a couple of alternatives to implement backup copies of inmemory caches that was rejected during initial analyzes:

  1. Store entries instead of full data region copy 
    The idea of this approach is to store entries in the file instead of pagesfull data region copy on the disk.
    1. Pros:
      1. cache group granularity like in persistent snapshots.
      2. smaller snapshot size in case of snapshotting specific cache group. Currently, cache group snapshot granularity not supported by persistent snapshots.
      3. backward compatibility of BinaryObject only required. PageIO strusture can be changed 
      4. ability to implement primary-only mode.
      5. Mostly sequential writes.
    2. Cons:
      1. restore must setup all Ignite internals strustured regarding restored offheap data.
        Such code can be difficult to implement and maintain
      Cons:
      1. restore require more time because per-entry local put operation must be invoked on each node.
  2. On demand persistence
    The idea of this approach is to reuse PDS infrastucture and persistent snapshot code by introducing new cache mode "PDS_ON_DEMAND".
    This mode will use persitence code but with WAL and checkpoint disabled. So on creating snapshot regular checkpoint will be performed.
    After checkpoint PDS files are ready to be copied to snapshot folder.
    1. Pros:
      1. Code reuse.
    2. Cons:
      1. Additional configuration on user-side required (set new mode for DataRegion).
      2. All Ignite codebase needs to be aware of new mode - baseline, PDS, WAL, checkpoint, etc.
      3. PDS page stores additional data - storage overhead.
  3. shmem usage
    The idea of this approach is to use shared memory feature of Linux.
    1. Pros:
      1. Easy to implement (questioneable).
      2. Same approach used by other vendors to implement in-memory cluster restarts.
    2. Cons:
      1. OS specific.
      2. Only for certain scenarios. Doesn't cover all use-cases.
  4. Use "snapshot" name for feature instead of dump.

Risks and Assumptions

  • DataRegionConfiguration#persistenceEnabled=false  for in-memory caches by the definition.
  • The same value must be for DataRegionConfiguration when cache group restored from in-memory snapshot.
  • After this feature implemented PageIO will require to be backward compatible.
  • The way to restore snapshot on different topology must be further investigated.
  • Empty pages of DataRegion will be written to snapshot.
  • Compaction of snapshot must be further investigated.
  • No concurrent snapshot operation - persistent, in-memory allowed. This can be eliminated in next phases to provide the ability to create full cluster snapshot by one command.
  • In case of mixed cluster(both persistence and in-memory data region exists) metastorage is persistent and must be included into in-memory snapshot.  Consistent per-entry snapshot that will be implemented in this IEP can be created for persistence caches also.

Phases scopes

  • Phase 1
    • snapshot dump creation.
    • restore snapshot dump on the same topology.
    • control utility integration.
    • metrics, system views integration.
    • new SecurityPermission for creating/restore dumps.
    • new event to track/audit dump operations.
    • dumping as ZIP file.
  • Phase 2
    • restore snapshot on different topology.
    Phase 3
    • snapshot compactification.

Discussion Links

https:// Links to discussions on the devlist, if applicable.lists.apache.org/thread/4cmxn98zcsmtbofpgm1v39sl7pdblxgq

Reference Links

Tickets

Jira
serverASF JIRA
columnIdsissuekey,summary,issuetype,updated,assignee,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columnskey,summary,type,updated,assignee,Priority,Priority,Priority,Priority,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-109) order by key
serverId5aa69414-a9e9-3523-82ec-879b028fb15b