Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Other persistence-enabled clusters , such as Cassandra, require a manual acknowledgment when adding or removing nodesseparate cluster membership and data distribution. If a node fails, it is considered temporarily offline, and no data movement starts until an administrator confirms that the node is offline permanently.

...

To resolve the issues described above, we introduce a concept of affinity baseline topology: a target set of nodes intended to keep data for persistence-enabled caches. We will also attach a list of hashes to each baseline topology generated on on branching points that  that will allow us to keep track of activation history and prevent data divergence in the cluster. A branching point is a cluster action which may affect data integrity of the cluster. One of the examples of branching points is cluster activation.

Baseline topology and branching history must be stored in a reliable reliable fail-safe metadata storage which  which should be available for reading and updating upon node join. This is needed to detect branch divergence and prevent stale nodes from joining the cluster. Each baseline topology change and branching action is saved in the metadata storage.

Phase I

During Phase I, cache affinity topology will be decoupled form cluster nodes topology.

Affinity for persistence-enabled caches is calculated using the baseline topology and then offline nodes are subtracted from the mapping. When a new node joins the cluster or a node goes offline, the baseline topology does not change, only affinity mapping is corrected with regard to offline nodes. Since affinity must be calculated for offline nodes, the cluster must be able to create 'phantom' topology nodes and pass them to an affinity function. Since affinity function may use arbitrary node attributes, we will introduce an interface that declares which node attributes are used in affinity calculation. Required node attributes will be stored in the metadata storage.

...

Functionality that is not related to data affinity (compute grid, services) is not affected by Phase I Functionality related to in-memory caches should work the same way as before baseline is introduced.

Phase II

During Phase II, cache affinity for in-memory caches and persistence-enabled caches should be merged into one Affinity Topology. In order to improve user experience and keep the old behavior for in-memory caches, affinity topology switch policies are introduced. 

The policy is defined by a single boolean switch (auto-adjust enabled) and two timeouts - a soft timeout, which extends after each topology change and a hard timeout which should trigger baseline change if the hard timeout passes after first topology change event.

The policy configuration should be adjustable in runtime via JMX beans or control.sh utility and persist it's configuration to the node metastore. A joining node should use the most actual cluster configuration aligned upon cluster join.

After baseline adjustment on timeout is introduced, we can change the behavior of in-memory caches to conform to baseline topology, this should resolve the issues with joins between in-memory and persistent caches and open up more opportunities for PME optimizations.

After all, we should introduce some sort of tracking of which node 'seen' the latest version of each partition upon node failure and work with partition LOSS policy accordingly.

Functionality related to compute grid and services is not affected by Phase II.

Phase III

During Phase III, a procedure of graceful node decommissioning is introduced. This procedure should allow to shrink clusters with 0 backups. During a decommission procedure, cluster should calculate an intermediate affinity and rebalance partitions that are owned by a node being decommissioned. After rebalance is finished, the node may be excluded from the cluster.

An open question for the Phase III is whether Service Grid and Compute functionality should be allowed on non-affinity nodes.

Usability considerations

It's necessary to add an ability to manage baseline topology to both command-line utility set (visorcmd, control.sh script) and consider adding it to the WebConsole. 

...

http://apache-ignite-developers.2346864.n4.nabble.com/Design-proposal-automatic-creation-of-BaselineTopology-td20756.html

http://apache-ignite-developers.2346864.n4.nabble.com/Baseline-auto-adjust-s-discuss-td40330.html

Reference Links

// Links to various reference documents, if applicable.

Open Tickets Phase I

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-4) and labels IN ('Phase-1') and status not in (closed, resolved)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b

Closed Tickets Phase I

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-4) and labels IN ('Phase-1') and status in (closed, resolved)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b

Open Tickets

...

Phase II

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-4) and labels IN ('Phase-2') and status not in (closed, resolved)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b

Closed Tickets Phase II

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (IEP-4) and labels IN ('Phase-2') and status in (closed, resolved)
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
// Links or report with relevant JIRA tickets.