Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

[2] Counter based rebalancing.
In the second case, which is relevant only for the persistence mode, re-balance is used to restore the consistency of the data if the node was absent for some time in the topology under the load.
When the node re-joins the topology, the partition update counter will be analyzed to determine how much the partition has “lagged” from other copies.
Depending on how much the partition is lagging behind, the second type of rebalance can use the write-ahead log as a data source. In this case, the rebalancing can be performed without a complete reloading of the partition and is called “historical”.
Another scenario for the occurrence of [2] is the activation of the grid, where some nodes may have stale data.
Re-balance is the exchange of supply and demand messages between nodes. A demand message is sent from node where partition has stale data to node with actual data. Supply messages is send in response to demand message, after processing a supply message next demand message is send and so on until nothing more to supply.

Let us consider in more detail how counters are used to restore consistency (rebalance [2]).

When a node enters, it sends for each partition during PME lwm, as well as the depth of the history that it has on it to determine the possibility of historical rebalance. History is determined by a pair of counters (earlies_checkpoint_counter, lwm). The counter guarantees that there are no missed updates inside this story.
At the same time, the node may have missed updates with a higher number.
On the coordinator, lwm is compared and the node (or nodes) with the highest lwm value is selected, and this information is sent to the incoming node, which starts the rebalance procedure if it remains. The rebalance will be either complete with a preliminary clearing of the partition, or historical, if any node contains the necessary update history in the WAL.

TODO rebalance flow pic (fig.2).

Update counter structure and flow

...

One more imporant overvation: LWM could safely be incremented if the update is written to the WAL. Otherwise, a situation is possible when the counter has increased, but there is no corresponding update.

Let us consider in more detail how counters are used to restore consistency (counters rebalance [2]) when outdated node joins topology.

On joining node sends LWM for each partition as a part of partition map exchange protocol . Coordinator collects all LWMs and determines nodes with highest LWM, say maxLWM. These nodes are elected as data suppliers for a given partition.

If coordinator detects lagging node it sends a partition map with partition state changed to MOVING, triggering a rebalance on corresponding. The missing number of update on each node is between it's LWM and maxLWM

The simplified rebalancing flow is illustratred on fugure 3.

TODO fig. 3

Crash recovery

There is no guarantee that this update will be completed, for example, if during the dht prepare transaction some nodes will exit, the transaction may be rolled back.
When a node’s primary collapse under load situations may arise where may be permanent gaps in the updates on backup nodes in the event of transaction reordering.

...