Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The document describes principles used for achieving consistency between data copies for transactional caches.

Cluster topology

Cluster topology is a ordered set of client and server nodes at a certain point in time. Each stable topology is assigned a topology version number - monotonically increasing pair of counters (majorVer, minorVer). Each node maintains a list of nodes in the topology and it is the same on all nodes for given version. It is important to note that each node receives information about the topology change eventulally, that is, independently of the other node at the different time.

...

This field is used for recording updates applied out of order. This is possible due to a fact what updates this higher reservation counter could be applied to WAL before updates with lower reservation counter causing gaps in update sequence.

...

A range represents a sequence of updates, for example (5, 3) means three updates with number 6, 7, 8. We will use this notation again later.

Out of order updates range is held in the sequence only if an update with lower range is missing. For example, LWM=5, HWM=9, seq=(6,2) means updates 7, 8 were applied out of order and updates 6, 9 are still not applied.

...

The flow is closely bound to transactions protocol. Counters are modified during stages of two-phase commit (2PC) protocol used by Ignite to ensure distributed consistency.

2PC protocol variation used in Ignite consists of map, optional remap, lock, prepare and commit phases.

...

Near map is a mapping of partitions to primary nodes. In the example each transaction will have only one node s1 in the near map.

When near map is ready, for each key enlisted in transaction a lock request is issued , which aquires exclusive lock on each key (this is true for pessimistic transactionsoptimistic transactions locks are acquired during commit but principles are the same).

It may happen that on the primary node at the time of request processing the topology version N + 1 is already relevant or in the process of changing to what version.

...

If the lock is successfully acquired the prepare phase begins. This also globally locks a topology version. New topology cannot be set until all transactions started on current topology are finished. This is handled by PME protocol.

Each locked First, each enlisted key is assigned the update sequence number generated by using next value of HWM counter field. If multiple keys are prepared the transaction reserves locked in the same partition a range will be reserved.

Next, When for each primary partition a DHT map is calculated. DHT map is a mapping of partitions to backup nodes. In the example each transaction will have 2 nodes in DHT map: s2 and s3.

When DHT map is ready primary node sends prepare requests to corresponding the backup nodes , if any, containing enlisted keys and their update numberscounters.

Thus, update numbers counters are synchronized between primaries and backups - assigned to primaries, then transferred to backups.

If all backup nodes are answered positively, then the transaction goes into the PREPARED state. In this state, it is ready for commit.

...

Because transaction commits can reorder it's possible to have some updates applied out of order, as illustrated on the pictureshow by the example above.

The out of order updates sequence is used for tracking such updates.

...