...
The "init" command is supposed to move the cluster from the "zombie" state into the "active" state. It is supposed to have the following characteristics (note that the "init" command has not been specified at the moment of writing and is out of scope of this document, so all statements are approximate and can change in the future):
...
This document uses a notation of "initialized" and "empty" nodes. An initialized node is a node that has received the "init" message , an empty node is a node that has entered the topology (i.e. has passed the first validation step), but has not yet configured its Meta Storage componentsometime in its lifetime and therefore possesses the cluster tag and the meta storage version. An empty node has never received the "init" command and does not possess the aforementioned properties.
Meta Storage version is a totally ordered property that should be used to compute the most "recent" state of the Meta Storage configuration. A possible implementation can be a monotonically increasing counter, which is increased every time the Meta Storage configuration (e.g. addresses of nodes that host the Meta Storage Raft group) is updated.
A cluster tag is a string that uniquely identifies a cluster (e.g. a UUID). It is generated once per cluster and is distributed across the nodes during the "init" phase. The purpose of a cluster tag is to understand whether a joining node used to be a member of another cluster, in which case its Meta Storage version is not comparable and the joining node should be rejected.
...
As described. This step will be common for all scenarios regarding the state of the Meta Storage and looks like the following:
A joining node tries to enter the topology. It is possible to piggyback on the transport of the membership protocol in order to exchange validation messages before allowing to send membership messages (similar to the handshake protocol). During this step it sends some information (cluster tag, Meta Storage version, node version) to a random node and gets validated (more details below). After this step is complete, the joining node becomes visible through the Topology Service, therefore establishing an invariant that visible topology will always consist of nodes that have passed the first validation step. Possible issues: there can be a race condition when multiple conflicting nodes join at the same time, in which case only the first node to join will be valid. This can be considered expected behavior, because such situations can only occur during the initial set up of a cluster, which is a manual process and requires manual intervention anyway.
If an empty node tries to join a cluster the The following process is proposed as the join protocol.:
...
If an initialized node tries to join a cluster the following process is proposed:
Current TopologyService
will be renamed to NetworkTopologyService
. It is proposed to extend this service to add validation handlers that will validate the joining nodes on the network level.
...