Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This document uses a notation of "initialized" and "empty" nodes. An initialized node is a node that has received the "init" message sometime in its lifetime and therefore possesses the cluster tag Cluster Tag and the Meta Storage Topology version. An empty node is a node that has never received the "init" command and does not possess the aforementioned properties.

Meta Storage Topology

...

Version

Meta Storage Topology version Version is a property that should be used to compute the most "recent" state of a given Meta Storage configuration. At the moment of writing, Meta Storage configuration consists of a list of cluster node names that host the Meta Storage Raft group. A possible implementation can be a monotonically increasing counter, which is increased each time this list is updated.

...

A cluster tag is a string that uniquely identifies a cluster. It is generated once per cluster and is distributed across the nodes during the "init" phase. The purpose of a cluster tag is to understand whether a joining node used to be a member of another cluster, in which case its Meta Storage Topology version Version is not comparable and the joining node should be rejected. Together with the Meta Storage Topology Version, it creates a partial ordering that allows to compare different configuration versions.

...

  1. Human-readable part: a string property that is set by the system administrator. Its purpose is to make the debugging and error reporting easier.
  2. Unique part: a generated unique string (e.g. a UUID). Its purpose is to ensure that cluster tags are different between different clusters.

DISCUSSION NEEDED: human-readable part can also be generated automatically, similar to Ignite 2.

Implementation details

Join Coordinator election

NOTE: this section is not described as thorough as it needs to be in order to save some time and finalize the election protocol after a discussion.

Before the nodes can start joining a cluster, a node should be elected as the Join Coordinator. For the sake of simplicity, the following algorithm can be is proposed, which can later be replaced with something more sophisticated:

  1. Given a list of initial cluster members, choose the "smallest" address (for example, using an alphanumeric order), which will implicitly be considered the Join Coordinator. This requires all nodes to have the same IP Finder configuration (used to obtain the initial cluster member list) to be identical on all initial cluster members.
  2. If the "smallest" address is unavailable, all other nodes should fail to start after a timeout and should be manually restarted again.

DISCUSSION NEEDED: What to do when constructing a cluster from some amount of stopped nodes with different Meta Storage configuration? Should it be overridden by the "init" command?

...