Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

When building a cluster of Ignite nodes, users need to be able to establish some restrictions on the member nodes based on cluster invariants in order to avoid breaking the consistency of the cluster. Such restrictions may include: having the same product version across the cluster, having consistent table and memory configurationsIgnite clusters are required to have a dynamically changing topology due to the nature of distributed systems - nodes can fail and need to be restarted at any time, so new nodes can be introduced or removed from the cluster at any time. Users expect that topology changes do not break the consistency of the cluster and it remains operational.

Description

Problem statement

This document describes the process of a new node joining creating a cluster , which consists of a validation phase, where a set of rules are applied to determine whether the incoming node is able to enter the current topology. Validation rules may include node-local information (e.g. product version and the cluster tag) as well as cluster-wide information (discussion needed: are there any properties that need to be retrieved from the Meta Storage?), which means that the validation component may require access to the Meta Storage (it is assumed that Meta Storage contains the consistent cluster-wide information, unless some other mechanism is proposed). The problem is that, of Ignite nodes and adding new nodes to it. It describes the following concepts:

  1. Cluster setup - the process of assembling a predefined set of nodes into a cluster and preparing it for the next lifecycle steps;
  2. Cluster initialization - according to the Node Lifecycle description, a cluster can exist in a "zombie" state,

...

  1. until an "init" command is sent by the system administrator;
  2. Node validation - in order to maintain the cluster configuration in a consistent state, the joining nodes should be compatible with all of the existing cluster nodes. Compatibility is ensured by validating nodes' properties, which may include some local properties (Ignite product version, Cluster Tag), as well as distributed properties from the Meta Storage.

DISCUSSION NEEDED: are there any properties that need to be retrieved from the Meta Storage?

  1. "Pre-init" validation: a joining node tries to enter the topology on the network level and gets validated against its local properties.
  2. "Post-init" validation: the cluster has received the "init" command, which activates the Meta Storage, and the joining node can be validated against the cluster-wide properties.

Apart from the 2-step validation, there are also the following questions that need to be addressed:

...

Terminology

Init command

The "init" command is supposed to move moves the cluster from the "zombie" state into the "active" state. It is supposed to have the following characteristics (note that the "init" command has not been specified at the moment of writing and is out of scope of this document, so all statements are approximate and can change in the future):

  1. It should deliver the following information: addresses of the nodes that host the Meta Storage Raft group , Meta Storage Topology version and a cluster tag and the Cluster Tag (described below).
  2. It should deliver this information atomically, i.e. either all nodes enter the "active" state or none.

...

The node join process is proposed to be made centralized: a single node is granted the role of the Join Coordinator and which is responsible for the following:

...

of orchestrating the cluster lifecycle and validating new nodes.

Cluster Tag

A cluster tag is a string that uniquely identifies a cluster. It is generated once per cluster and is distributed across the nodes during the "init" phase. The purpose of a cluster tag is to understand whether a joining node used to be a member of another cluster, in which case its Meta Storage Topology version is not comparable and the joining node should be rejected. Together with the Meta Storage Topology versionVersion, it creates a partial ordering that allows to compare different configuration versions.

...

  1. Given a list of initial cluster members, choose the "smallest" address (for example, using an alphanumeric order), which will implicitly be considered the Join Coordinator. This requires all nodes to have the same IP Finder configuration (used to obtain the initial cluster member list) to be identical on all initial cluster members.
  2. If the "smallest" address is unavailable, all other nodes should fail to start after a timeout and should be manually restarted again.

discussion neededDISCUSSION NEEDED: What to do when constructing a cluster from some amount of stopped nodes with different Meta Storage configuration? Should it be overridden by the "init" command?

discussion needed:  What DISCUSSION NEEDED:  What if we are restarting a cluster and also introducing a new node? What if it is elected as the coordinator?

discussion needed:  Can DISCUSSION NEEDED:  Can we actually use Raft and use its leader as the Join Coordinator? Can we use Raft's log to store the Meta Storage Topology Version? This way, it will be possible

...

  1. Initial set of nodes is configured, including the following properties:
    1. List of all nodes in the initial cluster setup (provided by the IP Finder).
  2. A Join Coordinator is elected (see "Join Coordinator election");
  3. Join Coordinator generates a Cluster Tag, if it doesn't already have it already in its Vault (e.g. an existing cluster is being restarted);
  4. All other nodes connect to the Coordinator and provide the following information:
    1. Ignite product version;
    2. Cluster Tag, if any (if a node has obtained it at any time during its life);
    3. Meta Storage Topology version (if a node has obtained it at any time during its life).
  5. All of the aforementioned parameters get compared with the information, stored on the Coordinator, and if all of the parameters are the same, the joining node is allowed into the cluster. Otherwise, the joining node is rejected.
  6. Join Coordinator adds the new node to the list of validated nodes. This can be implemented using different approaches, for example it is possible to piggyback on the current Cluster Membership implementation (ScaleCube) and to modify its transport protocol so that only valid nodes will be visible to others (similar to the implementation of the Handshake Protocol). Another way is to explicitly store and maintain a list of valid nodes in the Meta Storage.
  7. If the joining node is allowed to enter the topology, it receives the following parameters from the Coordinator:
    1. Cluster Tag;
    2. Meta Storage Topology version (if any, see "Cluster initialization").

discussion neededDISCUSSION NEEDED: What to do if the Coordinator dies during any step of the setup process.

...

  1. After the cluster has been established, it remains in the "zombie" state, until the "init" command arrives;
  2. "Init" command is sent by the administrator either directly to the Join Coordinator, or to any other node, in which case it should be redirected to the Join Coordinator;
  3. The "init" command should specify the following information:
    1. Human-readable part of the Cluster Tag;
    2. List of addresses of the nodes that should host the Meta Storage Raft group (a.k.a. Meta Storage Configuration).
  4. The Join Coordinator completes the creation of the Cluster Tag by generating the unique part and generates the initial Meta Storage Topology Version property;
  5. The Join Coordinator atomically broadcasts the Meta Storage Configuration to all valid nodes in the topology. If this step is successful, then Meta Storage is considered to be initialized and available;
  6. The Join Coordinator persists the following information into the Meta Storage (therefore propagating it to all nodes):
    1. Cluster Tag;
    2. List of addresses of all nodes that have passed the initial validation;
    3. Meta Storage Configuration Version.

discussion neededDISCUSSION NEEDED: What to do if the Coordinator dies during any step of the initialization process.

...

  1. The new node connects to a random node. If this node is not the Join Coordinator, the random node redirects the new node to it.
  2. The joining node connect to the Coordinator and provide the following information:
    1. Ignite product version;
    2. Cluster Tag;
    3. Meta Storage Topology Version.
  3. If the Cluster Tag and the Ignite product version do not match with the ones stored on the Coordinator, the node is rejected. If they match, the Meta Storage Topology Version is compared. If the node's Meta Storage Topology Version is smaller or equal to the Version on the Coordinator, it is allowed to enter the topology and the updates its Meta Storage Configuration.

discussion neededDISCUSSION NEEDED: What to do if the joining node's Meta Storage Topology Version is greater than the Version stored on the Coordinator.

...

  1. "Init" command is not fully specified and can influence the design.
  2. Proposed implementation does not discuss message encryption and security credentials.Two-layered topology view may be confusing to use.

Discussion Links

// Links to discussions on the devlist, if applicable.

...