ID | IEP-77 |
Author | Aleksandr Polovtsev |
Sponsor | |
Created |
|
Status | DRAFT |
When building a cluster of Ignite nodes, users need to be able to establish some restrictions on the member nodes based on cluster invariants in order to avoid breaking the consistency of the cluster. Such restrictions may include: having the same product version across the cluster, having consistent table and memory configurations, enforcing a particular cluster state.
This document describes the process of a new node joining a cluster, which includes a validation step where a set of rules are applied to determine whether the incoming node is able to enter the current topology. These rules may include node-local information (e.g. node version) as well as cluster-wide information (e.g. data encryption algorithm), which means that the validation component may require access to the Meta Storage (it is assumed that Meta Storage contains the consistent cluster-wide information, unless some other mechanism is proposed). The problem is that, according to the Node Lifecycle description, a cluster can exist in a "zombie" state, during which the Meta Storage is unavailable. This means the the validation process can be split into 2 steps:
Apart from the 2-step validation, there are also the following questions that need to be addressed:
The "init" command is supposed to move the cluster from the "zombie" state into the "active" state. It is supposed to have the following characteristics (note that the "init" command has not been specified at the moment of writing and is out of scope of this document, so all statements are approximate and can change in the future):
This document uses a notation of "initialized" and "empty" nodes. An initialized node is a node that has received the "init" message sometime in its lifetime and therefore possesses the cluster tag and the meta storage version. An empty node has never received the "init" command and does not possess the aforementioned properties.
Meta Storage version is a totally ordered property that should be used to compute the most "recent" state of the Meta Storage configuration. A possible implementation can be a monotonically increasing counter, which is increased every time the Meta Storage configuration (e.g. addresses of nodes that host the Meta Storage Raft group) is updated.
A cluster tag is a string that uniquely identifies a cluster (e.g. a UUID). It is generated once per cluster and is distributed across the nodes during the "init" phase. The purpose of a cluster tag is to understand whether a joining node used to be a member of another cluster, in which case its Meta Storage version is not comparable and the joining node should be rejected.
Local validation approach requires the joining node to retrieve some information from a random node/Meta Storage and deciding to join the cluster based on that information.
This approach has the following pros and cons:
Remote validation approach requires the joining node to send some information about itself to a remote node, which decides whether to allow the new node to join or not.
This approach has the following pros and cons:
Discussion needed: At the time of writing this document, it is assumed that validation protocol is going to be remote.
Based on the initialization status of the nodes in a cluster, there can be several possible scenarios of new nodes entering the topology:
A joining node tries to enter the topology. It is possible to piggyback on the transport of the membership protocol in order to exchange validation messages before allowing to send membership messages (similar to the handshake protocol). During this step it sends some information (cluster tag, Meta Storage version, node version) to a random node and gets validated (more details below). After this step is complete, the joining node becomes visible through the Topology Service, therefore establishing an invariant that visible topology will always consist of nodes that have passed the first validation step. Possible issues: there can be a race condition when multiple conflicting nodes join at the same time, in which case only the first node to join will be valid. This can be considered expected behavior, because such situations can only occur during the initial set up of a cluster, which is a manual process and requires manual intervention anyway.
If an empty node tries to join a cluster the following process is proposed:
If an initialized node tries to join a cluster the following process is proposed:
Current TopologyService
will be renamed to NetworkTopologyService
. It is proposed to extend this service to add validation handlers that will validate the joining nodes on the network level.
/** * Class for working with the cluster topology on the network level. */ public interface NetworkTopologyService { /** * This topology member. */ ClusterNode localMember(); /** * All topology members. */ Collection<ClusterNode> allMembers(); /** * Handlers for topology events (join, leave). */ void addEventHandler(TopologyEventHandler handler); /** * Returns a member by a network address */ @Nullable ClusterNode getByAddress(NetworkAddress addr); /** * Handlers for validating a joining node. */ void addValidationHandler(TopologyValidationHandler handler); }
The new service will have the same API, but will work on top of the Meta Storage, and will provide methods to work with the list of validated nodes. In addition to that, it will perform the validation of incoming nodes against the Meta Storage, based on the registered validation handlers.
/** * Class for working with the cluster topology on the Meta Storage level. Only fully validated nodes are allowed to be present in such topology. */ public interface TopologyService { /** * This topology member. */ ClusterNode localMember(); /** * All topology members. */ Collection<ClusterNode> allMembers(); /** * Handlers for topology events (join, leave). */ void addEventHandler(TopologyEventHandler handler); /** * Returns a member by a network address */ @Nullable ClusterNode getByAddress(NetworkAddress addr); /** * Handlers for validating a joining node. */ void addValidationHandler(TopologyValidationHandler handler); }
TopologyService
will depend on the MessagingService
(to respond and listen to validation requests) and on the MetaStorageManager
(for interacting with the Meta Storage).
// Links to discussions on the devlist, if applicable.