Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

discussion needed:  What if we are restarting a cluster and also introducing a new node? What if it is elected as the coordinator?

discussion needed:  Can we actually use Raft and use its leader as the Join Coordinator? Can we use Raft's log to store the Meta Storage Topology Version? This way, it will be possible

TODO: describe coordinator re-election.

...

  1. After the cluster has been established, it remains in the "zombie" state, until the "init" command arrives.;
  2. "Init" command is sent by the administrator either directly to the Join Coordinator, or to any other node, in which case it should be redirected to the Join Coordinator.;
  3. The "init" command should specify the following information:
    1. Human-readable part of the Cluster Tag;
    2. List of addresses of the nodes that should host the Meta Storage Raft group (a.k.a. Meta Storage Configuration).
  4. The Join Coordinator completes the creation of the Cluster Tag by generating the unique part and generates the initial Meta Storage Configuration Topology Version property.;
  5. The Join Coordinator atomically broadcasts the Meta Storage Configuration to all valid nodes in the topology. If this step is successful, then Meta Storage is considered to be initialized and available.;
  6. The Join Coordinator persists the following information into the Meta Storage (therefore propagating it to all nodes):
    1. Cluster Tag;
    2. List of addresses of all nodes that have passed the initial validation;
    3. Meta Storage Configuration Version.

discussion needed: What to do if the Coordinator dies during any step of the initialization process.

...

Adding a new node to a running cluster

This section describes a scenario when a new node wants to join an initialized cluster. Depending on the node configuration, there exist multiple possible scenarios:

...

If an empty node tries to join a cluster the following process is proposed:

  1. It The new node connects to a random node. If this node is not the Join Coordinator, sends the available local validation information and enters the topology, if it gets accepted.The following scenarios can then happen:
  2. The random node is initialized. The joining node should then retrieve the information, that was broadcasted by the "init" command, become initialized and finish the join process.
  3. The random node is empty because the cluster has not yet been initialized. In this case the node finishes the join process and remains in the "zombie" state until the "init" command arrives.
  4. The random node is empty because it hasn't finished the join process itself. In this case the random node should send a corresponding message, and the joining node should choose another random node and repeat the processthe random node redirects the new node to it.
  5. The new node provides its Ignite product version and is validated on the Coordinator (see "Initial cluster setup").
  6. If the validation is successful, the new node is added to the list of validated nodes and receives the Meta Storage Configuration as the response.

In case of errors or a timeout, the whole process should be repeated by the joining node.

Initialized node joins a cluster

If an initialized node tries to join a cluster the following process is proposed:

  1. It The new node connects to a random node and sends the available local validation information (including the cluster tag and the Meta Storage version).The following scenarios can then happen:
  2. The random node is initialized and the cluster tags do not match. The joining node must be rejected.
  3. The random node is initialized, the cluster tags match, local Meta Storage version is "smaller" than the remote. The node joins the topology and updates its Meta Storage configuration, thus ending the joining process.
  4. The random node is initialized, the cluster tags match, local Meta Storage version is "larger" than the remote. The joining node should initiate a process similar to sending the "init" command. Discussion needed: what to do if another "init" command is running in parallel?
  5. The random node is empty because the cluster has not yet been initialized. The joining node should initiate a process similar to sending the "init" command. Discussion needed: what to do if another "init" command is running in parallel?
  6. The random node is empty because it hasn't finished the join process itself. In this case the random node should send a corresponding message, and the joining node should choose another random node and repeat the process. If this node is not the Join Coordinator, the random node redirects the new node to it.
  7. The joining node connect to the Coordinator and provide the following information:
    1. Ignite product version;
    2. Cluster Tag;
    3. Meta Storage Topology Version.
  8. If the Cluster Tag and the Ignite product version do not match with the ones stored on the Coordinator, the node is rejected. If they match, the Meta Storage Topology Version is compared. If the node's Meta Storage Topology Version is smaller or equal to the Version on the Coordinator, it is allowed to enter the topology and the updates its Meta Storage Configuration.

discussion needed: What to do if the joining node's Meta Storage Topology Version is greater than the Version stored on the Coordinator.

Changes in API (WIP)

NetworkTopologyService

...