Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Users must have the ability to create a snapshot of persisted user data (in-memory is out of the scope).
  2. Users must have the ability to create a snapshot from the cluster under the load without cluster deactivation.
  3. The snapshot process must not block for a long time any of the user transactions (short-time blocks are acceptable).
  4. The snapshot process must allow creating a data snapshot on each node and transfer it to any of the remote nodes for internal cluster needs.
  5. The created snapshot at the cluster-level must be fully consistent from cluster-wide terms, there should not be any incomplete transactions inside.
  6. The snapshot of each node must be consistent – cache partitions, binary meta, etc. must not have unnecessary changes.

...

  1. The snapshot distributed process start starts a new snapshot operation by sending an initial discovery message.
  2. The distributed process configured action initiates a new local snapshot task on each cluster node.
  3. The discovery event from the distributed process pushes a new exchange task to the exchange worker to start PME.
  4. While transactions are blocked (see onDoneBeforeTopologyUnlock) each local node forces the checkpoint thread and waits while an appropriate local snapshot task starts.
  5. The distributed process collects completion results from each cluster node:
    1. If there are no execution errors and no baseline nodes left the cluster – the snapshot created successfully.
    2. If there are some errors or some of the cluster nodes left – fails prior to complete local results – all local snapshot results snapshots will be reverted from each node, the snapshot fails.

...

Another strategy must be used for cache partition files. The checkpoint process will write dirty pages from PageMemory to the cache partition files simultaneously with an another process copy them to the target directory. Each cache partition file is consistent only at checkpoint end. So, to avoid long-time transaction blocks during the cluster-wide snapshot process it should not wait when checkpoint ends on each node. The process of copying cache partition files must do the following:

...

  1. A new checkpoint starts (forced by node or a regular one).
  2. Under the checkpoint write lock – fix cache partition length for each partition (copy from 0  - to length ).
  3. The task creates new on-heap collections with marshaller meta, binary meta to copy.
  4. The task starts copying partition files.
  5. The checkpoint thread:
    1. If the associated with task checkpoint is not finished - write a dirty page to the original partition file and to delta file.
    2. If the associated with task checkpoint is finished and partition file still copying – read an original page from the original partition file and copy it to the delta file prior to the dirty page write.
  6. If partition file is copied – start merging copied partition with its delta file.
  7. The task ends then all data successfully copied to the target directory and all cache partition files merged with its deltas.

Remote snapshot

Crash recovery

...

The internal components must have the ability to request a consistent (not cluster

...

Risks and Assumptions

...

-wide, but locally) snapshot from remote nodes. The File Transmission protocol is used to transfer files between the cluster nodes. The local snapshot task can be reused independently to perform data copying and sending.

The high-level overview looks like:

  1. The node sends a request via CommunicaionSPI to the remote node with the map of cache group ids and required cache partitions.
  2. The remote node accepts the request and registers locally a new SnapshotFutureTask for execution.
  3. The task is started and sends independently cache partition files and cache delta files when they read.
  4. The node accepts cache partition files and accepts cache partition delta files by applying them on the fly to an appropriate partition (see TransmissionHandler#chunkHandler).
  5. When all requested cache groups and partitions received – the process completes successfully.
    If an error occurred during transmission the snapshot task stops and all temporary files deleted.

Crash recovery

During the cluster snapshot operation, a node may crash for some reason.

  • The crashed node must revert the local uncompleted snapshot at its startup.
  • If the crashed node hasn't complete local snapshot prior to the crash then all nodes affected by this cluster snapshot operation must delete their local snapshot results.

Limitations

  1. The whole cluster allowed only one cluster-wide snapshot operation per time.
  2. Encrypted caches currently not allowed due to required additional changes to merge cache partition with its delta file.
  3. The cache stop operation is not allowed during the ongoing cluster snapshot. An exception will be thrown to users on the cache stop attempt.
  4. The cluster snapshot operation will be stopped if some of the baseline nodes left or fail. Partial local snapshots will be reverted.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Hot-cache-backup-td41034.html

...