Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Block data region exclusively on each node - any attempt of usage (cache creation) must be blocked.
  2. Restore all saved data into data region.
  3. Restore all saved metadata.
  4. Wait all nodes complete step 2 and 3.
  5. Start caches that belongs to restored data region.

Rejected alternatives

There are a couple of alternatives to implement backup copies of inmemory caches that was rejected during initial analyzes:

  1. Store entries instead of data region
    The idea of this approach is to store entries in the file instead of pages.
    1. Pros:
      1. cache group granularity like in persistent snapshots.
      2. smaller snapshot size in case of snapshotting specific cache group. Currently, cache group snapshot granularity not supported by persistent snapshots.
      3. backward compatibility of BinaryObject only required. PageIO strusture can be changed 
      4. ability to implement primary-only mode.
    2. Cons:
      1. restore require more time because per-entry local put operation must be invoked on each node.
  2. On demand persistence
    The idea of this approach is to reuse PDS infrastucture and persistent snapshot code by introducing new cache mode "PDS_ON_DEMAND".
    This mode will use persitence code but with WAL and checkpoint disabled. So on creating snapshot regular checkpoint will be performed.
    After checkpoint PDS files are ready to be copied to snapshot folder.
    1. Pros:
      1. Code reuse.
    2. Cons:
      1. Additional configuration on user-side required (set new mode for DataRegion).
      2. All Ignite codebase needs to be aware of new mode - baseline, PDS, WAL, checkpoint, etc.
      3. PDS page stores additional data - storage overhead.
  3. shmem usage
    The idea of this approach is to use shared memory feature of Linux.
    1. Pros:
      1. Easy to implement (questioneable).
      2. Same approach used by other vendors to implement in-memory cluster restarts.
    2. Cons:
      1. OS specific.
      2. Only for certain scenarios. Doesn't cover all use-cases.

Risks and Assumptions

  • DataRegionConfiguration#persistenceEnabled=false  for in-memory caches by the definition.
  • The same value must be for DataRegionConfiguration when cache group restored from in-memory snapshot.
  • After this feature implemented PageIO will require to be backward compatible.
  • The way to restore snapshot on different topology must be further investigated.
  • Empty pages of DataRegion will be written to snapshot.
  • Compaction of snapshot must be further investigated.
  • No concurrent snapshot operation - persistent, in-memory allowed. This can be eliminated in next phases to provide the ability to create full cluster snapshot by one command.
  • In case of mixed cluster(both persistence and in-memory data region exists) metastorage is persistent and must be included into in-memory snapshot.  

...