...
ID | IEP-43 | ||||||
Author | |||||||
Sponsor | |||||||
Created |
| ||||||
Status |
|
Table of Contents |
---|
The most of open-source distributed systems provide `cluster snapshots` functionality, but the Apache Ignite doesn't have such one. Cluster snapshots will allow users to copy their data from an active cluster and load it later on another, such as copying data from a production system into a smaller QA or development system.
Snapshot storage path allowed to be configured by IgniteConfiguration
, by default IGNITE_HOME/work/snapshots
directory used.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public class IgniteConfiguration { /** * Directory where will be stored all results of snapshot operations. If {@code null} then * relative {@link #DFLT_SNAPSHOT_DIRECTORY} will be used. */ private String snapshotPath; } |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public interface IgniteSnapshot { /** * Create a consistent copy of all persistence cache groups from the whole cluster. * * @param name Snapshot name. * @return Future which will be completed when a process ends. */ public IgniteFuture<Void> createSnapshot(String name); } |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
package org.apache.ignite.mxbean; /** * Snapshot features MBean. */ @MXBeanDescription("MBean that provides access for snapshot features.") public interface SnapshotMXBean { /** * Create the cluster-wide snapshot with given name. * * @param snpName Snapshot name to created. * @see IgniteSnapshot#createSnapshot(String) (String) */ @MXBeanDescription("Create cluster-wide snapshot.") public void createSnapshot(@MXBeanParameter(name = "snpName", description = "Snapshot name.") String snpName); } |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
# Starts cluster snapshot operation. control.sh --snapshot ERIB_23012020 |
Internal API which allows to request and receive the required snapshot of cache groups from a remote. Used as a part of IEP-28: Rebalance peer-2-peer to send created local snapshot to the remote (demander) node.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
/** * @param parts Collection of pairs group and appropriate cache partition to be snapshot. * @param rmtNodeId The remote node to connect to. * @param partConsumer Received partition handler. * @return Future which will be completed when requested snapshot fully received. */ public IgniteInternalFuture<Void> createRemoteSnapshot( UUID rmtNodeId, Map<Integer, Set<Integer>> parts, BiConsumer<File, GroupPartitionId> partConsumer); |
Ignite distributed events functionality allows user applications to receive notifications when some of the events occur in the distributed cluster environment. User must be able to get notified for snapshot operation executions within the cluster:
Ignite must have a capability to specify permissions to allow/disallow execution of cluster snapshot operation. The following permission must be supported:
The snapshot procedure stores all internal files (binary meta, marshaller meta, cache group data files, and cache group configuration) the same directory structure way as the Apache Ignite does with preserving configured consistent node id.
...
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
maxmuzaf@TYE-SNE-0009931 ignite % tree work work └── snapshots └── backup23012020 ├── binary_meta │ ├── snapshot_IgniteClusterSnapshotSelfTest0 │ ├── snapshot_IgniteClusterSnapshotSelfTest1 │ └── snapshot_IgniteClusterSnapshotSelfTest2 ├── db │ ├── snapshot_IgniteClusterSnapshotSelfTest0 │ │ ├── cache-default │ │ │ ├── cache_data.dat │ │ │ ├── part-0.bin │ │ │ ├── part-2.bin │ │ │ ├── part-3.bin │ │ │ ├── part-4.bin │ │ │ ├── part-5.bin │ │ │ └── part-6.bin │ │ └── cache-txCache │ │ ├── cache_data.dat │ │ ├── part-3.bin │ │ ├── part-4.bin │ │ └── part-6.bin │ ├── snapshot_IgniteClusterSnapshotSelfTest1 │ │ ├── cache-default │ │ │ ├── cache_data.dat │ │ │ ├── part-1.bin │ │ │ ├── part-3.bin │ │ │ ├── part-5.bin │ │ │ ├── part-6.bin │ │ │ └── part-7.bin │ │ └── cache-txCache │ │ ├── cache_data.dat │ │ ├── part-1.bin │ │ ├── part-5.bin │ │ └── part-7.bin │ └── snapshot_IgniteClusterSnapshotSelfTest2 │ ├── cache-default │ │ ├── cache_data.dat │ │ ├── part-0.bin │ │ ├── part-1.bin │ │ ├── part-2.bin │ │ ├── part-4.bin │ │ └── part-7.bin │ └── cache-txCache │ ├── cache_data.dat │ ├── part-0.bin │ └── part-2.bin └── marshaller 17 directories, 30 files |
We need to provide the ability to restore individual cache groups from an existing snapshot on an active cluster.
The overall process includes the following sequential steps:
...
If errors occur (I/O errors, node failure, etc.), the changes made to the cluster must be fully or partially reverted (depending on the type of error).
The cluster should be active
To avoid the possibility of starting a node with inconsistent data, the partition files are first copied to a temporary directory and then this directory is moved using an atomic move operation. When the node starts, the temporary folder is deleted (if such exists).
If any of the nodes on which the snapshot was taken leave the cluster during the restore process, the process is aborted and all changes are rolled back (to achieve this, a special cache launch mode was introduced, if the node exits during the exchange, the process is rolled back).
// TBD
Sometimes the user is faced with the task of adding post-processing to the snapshot operation, for example, calculating the checksum of files, checking the consistency, etc. The same applies to the automatic restore procedure - the user should be able to check the consistency of files, checksums, etc. before restore them.
...
It is recommended to add the ability to load SnapshotLifecycleListener SnapshotHandler user extensions that will implement custom checks.
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
/** * Snapshot lifecycle listener. handler */ public interface SnapshotLifecycleListener<TSnapshotHandler<T> extends Serializable> extends Extension { /** ListenerSnapshot invocation priority (ascending order is used)handler type. */ public defaultSnapshotHandlerType int prioritytype() {; /** Local processing of returna 0; snapshot } operation. */ /** public @Nullable T invoke(SnapshotHandlerContext ctx) * Called locally after the snapshot files have been created on the node.throws Exception; /** Processing of results from all nodes. */ public * default void complete(String name, Collection<SnapshotHandlerResult<T>> *results) @paramthrows name Snapshot name.Exception { * @param metadata Snapshotfor metadata file.(SnapshotHandlerResult<T> res : results) { * @param binaryDir Snapshot binary metadataif directory.(res.error() == null) * @param marshallerDir Snapshot marshaller data directory. continue;; * @param cacheDir Snapshot cache data directory. *throw @throwsnew IgniteCheckedException("Snapshot handler Ifhas failed. " + */ public default void afterCreate(String"[snapshot=" + name, File+ metadata, Path binaryDir, Path marshallerDir, Path cacheDir) throws IgniteCheckedException { ", handler=" // No-op.+ getClass().getName() + } /** * Called locally before restore snapshot files.", nodeId=" + res.node().id() + "].", res.error()); * } } } /** type */ public @paramenum name Snapshot name.SnapshotHandlerType { /** @paramHandler grpsis Cachecalled groupsimmediately toafter bethe restoredsnapshot ({@code null} if all cache groups are restored from the snapshot). * @param metadata Snapshot metadata file. * @param binaryDir Snapshot binary metadata directory.is taken. */ CREATE, /** Handler is called just before restore operation is started. */ RESTORE } /** context */ public class SnapshotHandlerContext { SnapshotMetadata metadata; * @param marshallerDir Snapshot marshaller data directory. * @param cacheDir Snapshot cache data directory. * * @return Local node result, or {@code null} if cluster-wide aggregation is not required. * @throws IgniteCheckedException If failed. */ @Nullable public default T beforeRestore(String name, @Nullable Collection<String> grps, File metadata, Path binaryDir, Path marshallerDir, Path cacheDir) throws IgniteCheckedException { return null; } /** * Process the results of a pre-restore operation across the cluster. * * @param name Snapshot name. * @param grps Cache groups to be restored ({@code null} if all cache groups are restored from the snapshot). * @param res Results from all nodes. * @param errs Errors from all nodes. * @throws IgniteCheckedException If failed. */ public default void handlePreRestoreResults(String name, @Nullable Collection<String> grps, Map<ClusterNode, T> res, Map<ClusterNode, Exception> errs) throws IgniteCheckedException { Map.Entry<ClusterNode, Exception> errEntry = F.first(errs.entrySet()); if (errEntry == null) return; throw new IgniteCheckedException("Snapshot restore handler " + getClass().getSimpleName() + " has failed on node " + errEntry.getKey().id() + '.', errEntry.getValue()); } Collection<String> grps; ClusterNode locNode; } /** Result of local processing on the node. In addition to the result received from the handler, it also includes information about the error (if any) and the node on which this result was received. */ public class SnapshotHandlerResult<T> implements Serializable { T data; Exception err; ClusterNode node; } |
With respect to the cluster-wide snapshot operation, the process of creating a copy of user data can be split into the following high-level steps:
...
The Distributed Process is used to complete steps [1, 3]. To achieve the step [2] a new SnapshotFutureTask
must be developed.
The following must be copied to snapshot:
Binary meta, marshaller meta still stored in on-heap, so it is easy to collect and keep this persistent user information consistent under the checkpoint write-lock (no changes allowed).
Cache configuration files may be changed during, for instance, concurrent SQL ADD[DROP] COLUMN
operations or SQL CREATE[DROP] INDEX
operations. These changes always happen through the exchange thread.
Another strategy must be used for cache partition files. The checkpoint process will write dirty pages from PageMemory to the cache partition files simultaneously with another process copy them to the target directory. Each cache partition file is consistent only at checkpoint end. So, to avoid long-time transaction blocks during the cluster-wide snapshot process it should not wait when checkpoint ends on each node. The process of copying cache partition files must do the following:
...
The local snapshot task is an operation that executes on each local node independently. It copies all the persistence user files from the Ignite work directory to the target snapshot directory with additional machinery to achieve consistency. This task is closely connected with the node checkpointing process due to, for instance, cache partition files are only eventually consistent on disk during the ongoing checkpoint process and fully consistent only when the checkpoint ends.
The local snapshot operation on each cluster node reflects as – SnapshotFutureTask
.
0
- to length
).The cluster-wide snapshot is an operation which executes local snapshot task on each node. To achieve cluster-wide snapshot consistency the Partition-Map-Exchange will be reused to block for a while all user transactions.
...
The internal components must have the ability to request a consistent (not cluster-wide, but locally) snapshot from remote nodes. This machinery is reused for IEP-28: Rebalance via cache partition files to transmit cache partition files. The process requests for execution of a local snapshot task on the remote node and collects back the execution results. The File Transmission protocol is used to transfer files from the remote node to the local node.
...
TransmissionHandler#chunkHandler
).During the cluster snapshot operation, a node may crash for some reason.
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Hot-cache-backup-td41034.html
https://ignite.apache.org/docs/latest/persistence/snapshots
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|