You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

IDIEP-77
Author Aleksandr Polovtsev
Sponsor
Created

  

StatusDRAFT


Motivation

When building a cluster of Ignite nodes, users need to be able to establish some restrictions on the member nodes based on cluster invariants in order to avoid breaking the consistency of the cluster. Such restrictions may include: having the same product version across the cluster, having consistent table and memory configurations, enforcing a particular cluster state.

Description

Problem statement

This document describes the process of a new node joining a cluster, which includes a validation step where a set of rules are applied to determine whether the incoming node is able to enter the current topology. These rules may include node-local information (e.g. node version) as well as cluster-wide information (e.g. data encryption algorithm), which means that the validation component may require access to the Meta Storage (it is assumed that Meta Storage contains the consistent cluster-wide information, unless some other mechanism is proposed). The problem is that, according to the Node Lifecycle description, a cluster can exist in a "zombie" state, during which the Meta Storage is unavailable. This means the the validation process can be split into 2 steps:

  1. "Pre-init" validation: a joining node tries to enter the topology on the network level and gets validated against its local properties.
  2. "Post-init" validation: the cluster has received the "init" command, which activates the Meta Storage, and the joining node can be validated against the cluster-wide properties.

Apart from the 2-step validation, there are also the following questions that need to be addressed:

  1. Where will the whole process happen: on the joining node itself or on an arbitrary remote node.
  2. How to negate possible security issues if a not yet fully validated node gets access to the Meta storage.

Protocol description

Local validation

Local validation approach requires the joining node to retrieve some information from a random node/Meta Storage and deciding to join the cluster based on that information.

This approach has the following pros and cons:

  1. When using in the rolling upgrade scenario, it might be easier to maintain backward compatibility: a newer joining node already knows about the requirements of the older nodes in the cluster.
  2. Possible security issues: if a node is able to allow itself to join, it might be easier to compromise the cluster.

Remote validation

Remote validation approach requires the joining node to send some information about itself  to a remote node, which decides whether to allow the new node to join or not.

This approach has the following pros and cons:

  1. This approach is used in Ignite 2, which may be more familiar to users and developers.
  2. It may be more secure, since a node can't join without notifying at least one valid node.
  3. Harder to support backward compatibility.

Discussion needed: At the time of writing this document, it is assumed that validation protocol is going to be remote.

Init command

The "init" command is supposed to move the cluster from the "zombie" state into the "active" state. It is supposed to have the following characteristics (note that the "init" command has not been specified at the moment of writing, so all statements are approximate and can change in the future):

  1. It should deliver such information as addresses of nodes that host the Meta Storage.
  2. It should deliver this information atomically, i.e. either all nodes enter the "active" state or none. As a possible solution, it can be implemented similarly to a two-phase commit: first, a "prepare" message is broadcasted to all nodes in the current topology. The initiator node remembers the topology members at the start of the prepare phase and restarts the operation (or sends additional messages) until the topology is stable. After the prepare phase is finished, the commit message is broadcasted. Until the commit phase finishes, no new nodes are allowed to enter the cluster.

Implementation details

The following process is proposed as the join protocol.

  1. A joining node enters the topology. It is possible to piggyback on the transport of the membership protocol in order to exchange validation messages before allowing to send membership messages (similar to the handshake protocol). After this step is complete, the joining node becomes visible through the Topology Service, therefore establishing an invariant that visible topology will always consist of nodes that have passed the first validation step. Possible issues: there can be a race condition when multiple conflicting nodes join at the same time, in which case only the first node to join will be valid. This can be considered expected behavior, because such situations can only occur during the initial set up of a cluster, which is a manual process and requires manual intervention anyway.
  2. If the cluster has not yet been initialized (i.e. it has not received the "init" command), the node persists in the zombie state until the "init" command arrives (starting the process approximately described in <link>). After the command commenced, the nodes gain access to the Meta Storage and should be validated against it. Possible issues: it is unclear, what node should perform the validation step, maybe it should be the "init" command coordinator.
  3. If the cluster has been initialized, the node sends a request to a random node to obtain the information that had been propagated by the "init" command and to get validated against the Meta Storage.
  4. After the node passed both validation steps, it should be added to a list of valid nodes in the Meta Storage (the choice of the Meta Storage is dictated by the requirement that this list should be consistent on every node in the cluster).

Changes in API

NetworkTopologyService

Current TopologyService  will be renamed to NetworkTopologyService . It is proposed to extend this service to add validation handlers that will validate the joining nodes on the network level.

/**
 * Class for working with the cluster topology on the network level.
 */
public interface NetworkTopologyService {
    /**
     * This topology member.
     */
    ClusterNode localMember();

    /**
     * All topology members.
     */
    Collection<ClusterNode> allMembers();

    /**
     * Handlers for topology events (join, leave).
     */
    void addEventHandler(TopologyEventHandler handler);

    /**
     * Returns a member by a network address
     */
    @Nullable ClusterNode getByAddress(NetworkAddress addr);

    /**
     * Handlers for validating a joining node.
     */
    void addValidationHandler(TopologyValidationHandler handler);
}

TopologyService

The new service will have the same API, but will work on top of the Meta Storage, and will provide methods to work with the list of validated nodes. In addition to that, it will perform the validation of incoming nodes against the Meta Storage, based on the registered validation handlers.

/**
 * Class for working with the cluster topology on the Meta Storage level. Only fully validated nodes are allowed to be present in such topology.
 */
public interface TopologyService {
    /**
     * This topology member.
     */
    ClusterNode localMember();

    /**
     * All topology members.
     */
    Collection<ClusterNode> allMembers();

    /**
     * Handlers for topology events (join, leave).
     */
    void addEventHandler(TopologyEventHandler handler);

    /**
     * Returns a member by a network address
     */
    @Nullable ClusterNode getByAddress(NetworkAddress addr);

    /**
     * Handlers for validating a joining node.
     */
    void addValidationHandler(TopologyValidationHandler handler);
}

Risks and Assumptions

  1. "Init" command is not fully specified and can influence the design.
  2. Proposed implementation does not discuss message encryption and security credentials.
  3. Two-layered topology view may be confusing to use.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

  1. IEP-73: Node startup
  2. https://github.com/apache/ignite-3/blob/main/modules/runner/README.md
  3. IEP-67: Networking module

Tickets

Unable to render Jira issues macro, execution error.

  • No labels