You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 8
Next »
ID | IEP-43 |
Author | |
Sponsor | |
Created | |
Status | |
Motivation
The most of open-source distributed systems provide `cluster snapshots` functionality, but the Apache Ignite doesn't have such one. Cluster snapshots will allow users to copy their data from an active cluster and load it later on another, such as copying data from a production system into a smaller QA or development system.
Management
Create snapshot
[public] Java API
public interface IgniteSnapshot {
/**
* @return List of all known snapshots.
*/
public List<String> getSnapshots();
/**
* Create a consistent copy of all persistence cache groups from the whole cluster.
*
* @param name Snapshot name.
* @return Future which will be completed when a process ends.
*/
public IgniteFuture<Void> createSnapshot(String name);
}
[public] JMX MBean
package org.apache.ignite.mxbean;
/**
* Snapshot features MBean.
*/
@MXBeanDescription("MBean that provides access for snapshot features.")
public interface SnapshotMXBean {
/**
* Gets all created snapshots on the cluster.
*
* @return List of all known snapshots.
*/
@MXBeanDescription("List of all known snapshots.")
public List<String> getSnapshots();
/**
* Create the cluster-wide snapshot with given name.
*
* @param snpName Snapshot name to created.
* @see IgniteSnapshot#createSnapshot(String) (String)
*/
@MXBeanDescription("Create cluster-wide snapshot.")
public void createSnapshot(@MXBeanParameter(name = "snpName", description = "Snapshot name.") String snpName);
}
[public] Command Line
# Starts cluster snapshot operation.
control.sh --snapshot ERIB_23012020
# Display all known cluster snapshots.
control.sh --snapshot -list
[internal] File Transmission
Internal API which allows to request and receive the required snapshot of cache groups from a remote. Used as a part of IEP-28: Rebalance peer-2-peer to send created local snapshot to the remote (demander) node.
/**
* @param parts Collection of pairs group and appropriate cache partition to be snapshot.
* @param rmtNodeId The remote node to connect to.
* @param partConsumer Received partition handler.
* @return Future which will be completed when requested snapshot fully received.
*/
public IgniteInternalFuture<Void> createRemoteSnapshot(
UUID rmtNodeId,
Map<Integer, Set<Integer>> parts,
BiConsumer<File, GroupPartitionId> partConsumer);
Restore snapshot (manually)
The snapshot procedure stores all internal files (binary meta, marshaller meta, cache group data files, and cache group configuration) the same directory structure way as the Apache Ignite does with preserving configured consistent node id.
To restore a cluster from snapshot user must manually do the following:
- Remove data from the checkpoint, wal, binary_meta, marshaller directories.
- Copy all snapshot data files to the
IGNITE_HOME/work
directory with paying attention to consistent node ids.
maxmuzaf@TYE-SNE-0009931 ignite % tree work
work
└── snapshots
└── backup23012020
├── binary_meta
│ ├── snapshot_IgniteClusterSnapshotSelfTest0
│ ├── snapshot_IgniteClusterSnapshotSelfTest1
│ └── snapshot_IgniteClusterSnapshotSelfTest2
├── db
│ ├── snapshot_IgniteClusterSnapshotSelfTest0
│ │ ├── cache-default
│ │ │ ├── cache_data.dat
│ │ │ ├── part-0.bin
│ │ │ ├── part-2.bin
│ │ │ ├── part-3.bin
│ │ │ ├── part-4.bin
│ │ │ ├── part-5.bin
│ │ │ └── part-6.bin
│ │ └── cache-txCache
│ │ ├── cache_data.dat
│ │ ├── part-3.bin
│ │ ├── part-4.bin
│ │ └── part-6.bin
│ ├── snapshot_IgniteClusterSnapshotSelfTest1
│ │ ├── cache-default
│ │ │ ├── cache_data.dat
│ │ │ ├── part-1.bin
│ │ │ ├── part-3.bin
│ │ │ ├── part-5.bin
│ │ │ ├── part-6.bin
│ │ │ └── part-7.bin
│ │ └── cache-txCache
│ │ ├── cache_data.dat
│ │ ├── part-1.bin
│ │ ├── part-5.bin
│ │ └── part-7.bin
│ └── snapshot_IgniteClusterSnapshotSelfTest2
│ ├── cache-default
│ │ ├── cache_data.dat
│ │ ├── part-0.bin
│ │ ├── part-1.bin
│ │ ├── part-2.bin
│ │ ├── part-4.bin
│ │ └── part-7.bin
│ └── cache-txCache
│ ├── cache_data.dat
│ ├── part-0.bin
│ └── part-2.bin
└── marshaller
17 directories, 30 files
Snapshot requirements
- Users must have the ability to create a snapshot under the load without cluster deactivation.
- The snapshot process must not block for a long time any of the user transactions (short-time blocks are acceptable).
- The snapshot process must allow creating a data snapshot on each node and transfer it to any of the remote nodes for internal cluster needs.
- The created snapshot at the cluster-level must be fully consistent from cluster-wide terms, there should not be any incomplete transactions inside.
- The snapshot of each node must be consistent – cache partitions, binary meta, etc. must not have unnecessary changes.
Snapshot process
With respect to the cluster-wide snapshot operation, the process of creating a copy of user data can be split into the following high-level steps:
- Start a cluster-wide snapshot operation using any of the available public API.
- Each node will receive such an event and start a local snapshot task which must create a consistent copy of available user data.
- Collect the results of performing local snapshot tasks from each node and send the results back to the user.
The Distributed Process is used to complete steps [1, 3]. To achieve the step [2] a new SnapshotFutureTask
must be developed.
Cluster snapshot
Local snapshot
Remote snapshot
Limitations
Risks and Assumptions
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Hot-cache-backup-td41034.html
Reference Links
- Apache Geode – Cache and Region Snapshots
https://geode.apache.org/docs/guide/16/managing/cache_snapshots/chapter_overview.html - Apache Cassandra – Backing up and restoring data
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsBackupRestore.html
Tickets
// Links or report with relevant JIRA tickets.