Data consistency (wip)

The document describes principles used for achieving consistency between data copies.

Cluster topology

Cluster topology is a ordered set of client and server nodes at a certain point in time. Each stable topology is assigned a topology version number - monotonically increasing pair of counters (majorVer, minorVer). Each node maintains a list of nodes in the topology and it is the same on all nodes for given version. It is important to note that each node receives information about the topology change eventulally, that is, independently of the other node at the different time.

Affinity function

A set of all possible cache keys is divided between partitions. A partition is an indivisible data set in a grid, a subset of a total set of keys. The key uniquely determines the partition in which it will be stored.

Every change in topology causes new topology version. For each new version of the topology it is necessary to calculate how the data is distributed among the nodes in topology. Those nodes that are responsible for storing data are called affinity nodes. Each node owns a subset of partitions. If a new affinity node has joined then it is necessary to rebalance part of the data to the new node.

The distribution (sharding) of data on the nodes of the grid is defined by affinity function.

First, affinity function uniquely defines a partition for key: P(key) = partition

This is a static property and cannot change in time.

Second, affinity function defines which partitions belongs to the node: F(topology, part) = list(nodes)

At different points in time a partition can belong to different nodes. This is a dynamic property of a partition.

The number of nodes in this list is equal to the number of copies of the partition. Recall that one of the fundamental properties of a grid is its high availability. With the loss of a certain number of nodes, complete data loss does not occur and the load is not interrupted. The need to store multiple copies is due to the implementation of this property.

The first node in the list is the so-called primary node, has a special coordinating role for all operations on the partition. The remaining are backup nodes. In the event of a loss of a primary node, one of the backup nodes becomes the new primary.

Todo PIC for some AF calculated state.

An important property of affinity function is the uniform distribution of partitions over nodes.

The default implementation uses the hash rendezvous algorithm https://en.wikipedia.org/wiki/Rendezvous_hashing

Due to implementation of these propertoes linear scalability is achieved when adding new nodes to the grid.

Page tree

Data consistency (wip)