Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The flow is closely bound to transactions protocol. Counters are modified during stages of two-phase commit (2PC) protocol used by Ignite to ensure distributed consistency.

2PC protocol consists of maplock, prepare and commit phases.

...

The figure below shows the flow with the case of backup reordering, where tx with higher update counters is applied before tx with lower update counters on backup.

The first step (near map) on the near(originating) node is to calculate near map of this transaction - for each partition p a set of involved primary nodes using the affinity function F(p) . In our case same and only one primary node for both transactions.

Near and DHT map phases are omitted on the picture.

Near map is a mapping of partitions to primary nodes. In the example each transaction will have only one node s1 in the map.

When near map is ready, for each key enlisted in transaction a lock request is issued, which aquires exclusive lock on each key (For this partition on this topology version a lock request is sent that executes the lock (k) operation, the semantics of which is similar to acquiring exclusuve lock on given key (footnote: this is true for pessimistic transactions).

...

First, each enlisted key is assigned the update sequence number,  generated by next value of HWM counter field. If multiple keys are prepared the transaction reserves a range.

Next, DHT map is calculated.

DHT map is a mapping of partitions to backup nodes. In the example each transaction will have 2 nodes in DHT map: s2 and s3.

When DHT map is ready primary node Then primary node builds a backups map (dht map) and sends prepare requests to the backup nodes, if any, containing enlisted keys and their update numbers.

...

Because transaction commits can reorder it's possible to have some updates applied out of order, as illustrated on the picture. The field oomQueue is

The out of order updates sequence is used for tracking such updates.HWM

is increment during tx prepare phase on the primary node to ensure the monotonicity of the counter and invariant HWM >= LWM holdingLet's imaging out of order updates are not tracked and LWM doesn't exists. If primary node fails before applying update from Tx1 we end up with update counter=5 and will lose missed updates after node rejoins to the topology, because rebalancing will not start due to equal counters.

Restoring consistency

Let us consider in more detail how counters are used to restore consistency (counters rebalance [2]) when outdated node joins topology.

...