You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

 

Notes: 

  • Following notation is used: words written in italics and wihout spacing mean class names without package name, for example, GridCacheMapEntry

Contents:

 

(Partition Map) Exchange

PME - is process of exchange partition information between nodes. Process goal is to setup actual state of partitions for all cluster nodes.

Triggers

Events which causes exchange

  • Node Join (EVT_NODE_JOINED) - new node discovered and joined topology (exchange is done after node is included into ring)
  • Node Left (EVT_NODE_LEFT) - correct shutdown with call ignite.close
  • Node Failed (EVT_NODE_FAILED) - detected unresponsive node, probably crashed and is considered failed
  • Custom events with flag  exchange Needed
  • Start (dynamic) cache / stop cache

Exchange process

Phase 1. Accumulation of partition state

Cluster members change causes major topology update (e.g. 5.0→6.0), cache creation or destroy causes minor (e.g. 6.0→6.1).

Initialization of exchange means adding future GridDhtPartitionsExchangeFuture to queue on each node.

Put to this queue (GridCachePartitionExchangeManager.ExchangeWorker#futQ) is done from discovery thread.

'Exhange worker' thread manages this queue. Using only one exhange worker thread per node provides strict processing order for futures.

Step 1) First of all exhange worker thread runs init() on this exchange future. This method sends single map ( GridDhtPartitionsSingleMessage ). Signle map is sent using communication SPI (in peer2peer manner).

This signle message contains partitions map for each cache ( GridDhtPartitionMap ). It contains owners of each paritition for each cache group (cache group id is mapped to Partitions Map). Following is example from log for 2 caches:

msg=GridDhtPartitionsSingleMessage 
[parts=
{1544803905=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=9, minorTopVer=3], updateSeq=2, size=406], 
-2100569601=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=9, minorTopVer=3], updateSeq=161, size=100]}

Phase 2. Building and sharing full state

Step 2) Coordinator maintains list of nodes to wait response from (N1, N2, N3 for example at pic.1).

According to single maps received and to its own state coordinator build full map.

When all expected responses were received coordinator sends full map to other nodes.

Following is example of logging of 'Sending full partition map' for 2 caches:

msg=GridDhtPartitionsFullMessage [
 parts={
  1544803905=GridDhtPartitionFullMap [nodeId=1c6a3487-0ad2-4dc1-a69d-7cd84db00000, nodeOrder=1, updateSeq=6, size=5], 
  -2100569601=GridDhtPartitionFullMap [nodeId=1c6a3487-0ad2-4dc1-a69d-7cd84db00000, nodeOrder=1, updateSeq=102, size=5]}, 
 topVer=AffinityTopologyVersion [topVer=9, minorTopVer=3]

Partition state in full and signle maps are encoded using bit sets for compression.

On receiving full map for current exchange, nodes calls onDone() for this future. Listeners for full map are called (order is no guaranteed).

 

 

  • No labels