ID | IEP-43 | ||||||
Author | |||||||
Sponsor | |||||||
Created |
| ||||||
Status |
|
Table of Contents |
---|
The most of open-source distributed systems provide `cluster snapshots` functionality, but the Apache Ignite doesn't have such one. Cluster snapshots will allow users to copy their data from an active cluster and load it later on another, such as copying data from a production system into a smaller QA or development system.
Snapshot storage path allowed to be configured by IgniteConfiguration
, by default IGNITE_HOME/work/snapshots
directory used.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public class IgniteConfiguration { /** * Directory where will be stored all results of snapshot operations. If {@code null} then * relative {@link #DFLT_SNAPSHOT_DIRECTORY} will be used. */ private String snapshotPath;} |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public interface IgniteSnapshot { /** * @return List of all known snapshots. */ public List<String> getSnapshots(); /** * Create a consistent copy of all persistence cache groups from the whole cluster. * * @param name Snapshot name. * @return Future which will be completed when a process ends. */ public IgniteFuture<Void> createSnapshot(String name); } |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
package org.apache.ignite.mxbean; /** * Snapshot features MBean. */ @MXBeanDescription("MBean that provides access for snapshot features.") public interface SnapshotMXBean { /** * Gets all created snapshots on the cluster. * * @return List of all known snapshots. */ @MXBeanDescription("List of all known snapshots.") public List<String> getSnapshots(); /** * Create the cluster-wide snapshot with given name. * * @param snpName Snapshot name to created. * @see IgniteSnapshot#createSnapshot(String) (String) */ @MXBeanDescription("Create cluster-wide snapshot.") public void createSnapshot(@MXBeanParameter(name = "snpName", description = "Snapshot name.") String snpName); } |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
# Starts cluster snapshot operation. control.sh --snapshot ERIB_23012020 # Display all known cluster snapshots. control.sh --snapshot -list |
Internal API which allows to request and receive the required snapshot of cache groups from a remote. Used as a part of IEP-28: Rebalance peer-2-peer to send created local snapshot to the remote (demander) node.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
/** * @param parts Collection of pairs group and appropriate cache partition to be snapshot. * @param rmtNodeId The remote node to connect to. * @param partConsumer Received partition handler. * @return Future which will be completed when requested snapshot fully received. */ public IgniteInternalFuture<Void> createRemoteSnapshot( UUID rmtNodeId, Map<Integer, Set<Integer>> parts, BiConsumer<File, GroupPartitionId> partConsumer); |
The snapshot procedure stores all internal files (binary meta, marshaller meta, cache group data files, and cache group configuration) the same directory structure way as the Apache Ignite does with preserving configured consistent node id.
To restore a cluster from snapshot user must manually do the following:
IGNITE_HOME/work
directory with paying attention to consistent node ids.Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
maxmuzaf@TYE-SNE-0009931 ignite % tree work work └── snapshots └── backup23012020 ├── binary_meta │ ├── snapshot_IgniteClusterSnapshotSelfTest0 │ ├── snapshot_IgniteClusterSnapshotSelfTest1 │ └── snapshot_IgniteClusterSnapshotSelfTest2 ├── db │ ├── snapshot_IgniteClusterSnapshotSelfTest0 │ │ ├── cache-default │ │ │ ├── cache_data.dat │ │ │ ├── part-0.bin │ │ │ ├── part-2.bin │ │ │ ├── part-3.bin │ │ │ ├── part-4.bin │ │ │ ├── part-5.bin │ │ │ └── part-6.bin │ │ └── cache-txCache │ │ ├── cache_data.dat │ │ ├── part-3.bin │ │ ├── part-4.bin │ │ └── part-6.bin │ ├── snapshot_IgniteClusterSnapshotSelfTest1 │ │ ├── cache-default │ │ │ ├── cache_data.dat │ │ │ ├── part-1.bin │ │ │ ├── part-3.bin │ │ │ ├── part-5.bin │ │ │ ├── part-6.bin │ │ │ └── part-7.bin │ │ └── cache-txCache │ │ ├── cache_data.dat │ │ ├── part-1.bin │ │ ├── part-5.bin │ │ └── part-7.bin │ └── snapshot_IgniteClusterSnapshotSelfTest2 │ ├── cache-default │ │ ├── cache_data.dat │ │ ├── part-0.bin │ │ ├── part-1.bin │ │ ├── part-2.bin │ │ ├── part-4.bin │ │ └── part-7.bin │ └── cache-txCache │ ├── cache_data.dat │ ├── part-0.bin │ └── part-2.bin └── marshaller 17 directories, 30 files |
With respect to the cluster-wide snapshot operation, the process of creating a copy of user data can be split into the following high-level steps:
The Distributed Process is used to complete steps [1, 3]. To achieve the step [2] a new SnapshotFutureTask
must be developed.
To achieve cluster-wide snapshot consistency the Partition-Map-Exchange will be reused to block for a while all user transactions.
In a short amount of time while user transactions are blocked the local snapshot task will be started by forcing the checkpoint process. These actions have the following guarantees: all current transactions are finished and all new transactions are blocked, all data from the PageMemory
will be flushed on a disk at checkpoint end. This is a short time-window when all cluster data will be eventually fully consistent on the disk.
The cluster-wide snapshot task steps overview in terms of distributed process:
The local snapshot task is an operation that executes on each local node and copies all the persistence user files from the Ignite work directory to the target snapshot directory with additional machinery to achieve consistency. This task is closely connected with the node checkpointing process due to, for instance, cache partition files are only eventually consistent on disk during the ongoing checkpoint process and fully consistent only when the checkpoint ends.
The local snapshot operation on each cluster node reflects as – SnapshotFutureTask
.
The following must be copied to snapshot:
Binary meta, marshaller meta, configurations still stored in on-heap, so it is easy to collect and keep this persistent user information consistent under the checkpoint write-lock (no changes allowed).
Another strategy must be used for cache partition files. The checkpoint process will write dirty pages from PageMemory to the cache partition files simultaneously with another process copy them to the target directory. Each cache partition file is consistent only at checkpoint end. So, to avoid long-time transaction blocks during the cluster-wide snapshot process it should not wait when checkpoint ends on each node. The process of copying cache partition files must do the following:
.delta
files. Each file created per each partition. (e.g. part-1.bin
with part-1.bin.delta
– fully consistent cache partition).There are two possible cases during copy cache partition files simultaneously with the checkpoint thread:
0
- to length
).The internal components must have the ability to request a consistent (not cluster-wide, but locally) snapshot from remote nodes. The File Transmission protocol is used to transfer files between the cluster nodes. The local snapshot task can be reused independently to perform data copying and sending.
The high-level overview looks like:
TransmissionHandler#chunkHandler
).During the cluster snapshot operation, a node may crash for some reason.
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Hot-cache-backup-td41034.html
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|