You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

IDIEP-43
Author
Sponsor
Created

  

Status

ACTIVE


Motivation

The most of open-source distributed systems provide `cluster snapshots` functionality, but the Apache Ignite doesn't have such one. Cluster snapshots will allow users to copy their data from an active cluster and load it later on another, such as copying data from a production system into a smaller QA or development system. 

Management

Create snapshot

[public] Java API

IgniteSnapshot
public interface IgniteSnapshot {
    /**
     * @return List of all known snapshots.
     */
    public List<String> getSnapshots();

    /**
     * Create a consistent copy of all persistence cache groups from the whole cluster.
     *
     * @param name Snapshot name.
     * @return Future which will be completed when a process ends.
     */
    public IgniteFuture<Void> createSnapshot(String name);
}

[public] JMX MBean

SnapshotMXBean
package org.apache.ignite.mxbean;

/**
 * Snapshot features MBean.
 */
@MXBeanDescription("MBean that provides access for snapshot features.")
public interface SnapshotMXBean {
    /**
     * Gets all created snapshots on the cluster.
     *
     * @return List of all known snapshots.
     */
    @MXBeanDescription("List of all known snapshots.")
    public List<String> getSnapshots();

    /**
     * Create the cluster-wide snapshot with given name.
     *
     * @param snpName Snapshot name to created.
     * @see IgniteSnapshot#createSnapshot(String) (String)
     */
    @MXBeanDescription("Create cluster-wide snapshot.")
    public void createSnapshot(@MXBeanParameter(name = "snpName", description = "Snapshot name.") String snpName);
}

[public] Command Line

control.sh --snapshot
# Starts cluster snapshot operation.
control.sh --snapshot ERIB_23012020

# Display all known cluster snapshots.
control.sh --snapshot -list

[internal] File Transmission

Internal API which allows to request and receive the required snapshot of cache groups from a remote. Used as a part of IEP-28: Rebalance peer-2-peer to send created local snapshot to the remote (demander) node.

IgniteSnapshotManager#createRemoteSnapshot
/**
 * @param parts Collection of pairs group and appropriate cache partition to be snapshot.
 * @param rmtNodeId The remote node to connect to.
 * @param partConsumer Received partition handler.
 * @return Future which will be completed when requested snapshot fully received.
 */
public IgniteInternalFuture<Void> createRemoteSnapshot(
    UUID rmtNodeId,
    Map<Integer, Set<Integer>> parts,
    BiConsumer<File, GroupPartitionId> partConsumer);

Restore snapshot (manually)

The snapshot procedure stores all internal files (binary meta, marshaller meta, cache group data files, and cache group configuration) the same directory structure way as the Apache Ignite does with preserving configured consistent node id.

To restore a cluster from snapshot user must manually do the following:

  1. Remove data from the checkpoint, wal, binary_meta, marshaller directories.
  2. Copy all snapshot data files to the IGNITE_HOME/work  directory with paying attention to consistent node ids.
Snashot Directory Structure
maxmuzaf@TYE-SNE-0009931 ignite % tree work
work
└── snapshots
    └── backup23012020
        ├── binary_meta
        │   ├── snapshot_IgniteClusterSnapshotSelfTest0
        │   ├── snapshot_IgniteClusterSnapshotSelfTest1
        │   └── snapshot_IgniteClusterSnapshotSelfTest2
        ├── db
        │   ├── snapshot_IgniteClusterSnapshotSelfTest0
        │   │   ├── cache-default
        │   │   │   ├── cache_data.dat
        │   │   │   ├── part-0.bin
        │   │   │   ├── part-2.bin
        │   │   │   ├── part-3.bin
        │   │   │   ├── part-4.bin
        │   │   │   ├── part-5.bin
        │   │   │   └── part-6.bin
        │   │   └── cache-txCache
        │   │       ├── cache_data.dat
        │   │       ├── part-3.bin
        │   │       ├── part-4.bin
        │   │       └── part-6.bin
        │   ├── snapshot_IgniteClusterSnapshotSelfTest1
        │   │   ├── cache-default
        │   │   │   ├── cache_data.dat
        │   │   │   ├── part-1.bin
        │   │   │   ├── part-3.bin
        │   │   │   ├── part-5.bin
        │   │   │   ├── part-6.bin
        │   │   │   └── part-7.bin
        │   │   └── cache-txCache
        │   │       ├── cache_data.dat
        │   │       ├── part-1.bin
        │   │       ├── part-5.bin
        │   │       └── part-7.bin
        │   └── snapshot_IgniteClusterSnapshotSelfTest2
        │       ├── cache-default
        │       │   ├── cache_data.dat
        │       │   ├── part-0.bin
        │       │   ├── part-1.bin
        │       │   ├── part-2.bin
        │       │   ├── part-4.bin
        │       │   └── part-7.bin
        │       └── cache-txCache
        │           ├── cache_data.dat
        │           ├── part-0.bin
        │           └── part-2.bin
        └── marshaller

17 directories, 30 files

Snapshot requirements

  1. Users must have the ability to create a snapshot under the load without cluster deactivation.
  2. The snapshot process must not block for a long time any of the user transactions (short-time blocks are acceptable).
  3. The snapshot process must allow creating a data snapshot on each node and transfer it to any of the remote nodes for internal cluster needs.
  4. The created snapshot at the cluster-level must be fully consistent from cluster-wide terms, there should not be any incomplete transactions inside.
  5. The snapshot of each node must be consistent – cache partitions, binary meta, etc. must not have unnecessary changes.

Snapshot process

With respect to the cluster-wide snapshot operation, the process of creating a copy of user data can be split into the following high-level steps:

  1. Start a cluster-wide snapshot operation using any of the available public API.
  2. Each node will receive such an event and start a local snapshot task which must create a consistent copy of available user data.
  3. Collect the results of performing local snapshot tasks from each node and send the results back to the user.

The Distributed Process is used to complete steps [1, 3]. To achieve the step [2] a new SnapshotFutureTask  must be developed.

Cluster snapshot


Local snapshot


Remote snapshot


Limitations


Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Hot-cache-backup-td41034.html

Reference Links

  1. Apache Geode – Cache and Region Snapshots 
    https://geode.apache.org/docs/guide/16/managing/cache_snapshots/chapter_overview.html
  2. Apache Cassandra – Backing up and restoring data
    https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsBackupRestore.html

Tickets

// Links or report with relevant JIRA tickets.

  • No labels