Although all cache data is sent between peers in batches (GridDhtPartitionSupplyMessage used) it still processes entries one by one. Such an process has low impact with a pure in-memory Apache Ignite usage but it leads to additional fsyncs and logging WAL records with the native persistence enabled.

By default, setRebalanceThreadPoolSize is set to 1 and setRebalanceBatchSize to 512K which means that thousands of key-value pairs will be processed single-thread and individually. Such an approach impacts on:
- The extra unnecessary changes to keep node data structures up to date. Adding each entry record into CacheDataStore will traverse and modify each index tree N-times. It will allocate the space N-times within FreeList structure and will have to additionally store WAL page delta records with approximate complexity ~ O(N*log(N));
- Batch with N-entries will produce N-records in WAL which might end up with N fsyncs (assume fsync WAL mode configuration enabled);
- Increased the chance of huge JVM pauses. The more serving objects we produce by applying changes, the more often GC happens and the greater chance of JVM pauses arise;
The rebalancing procedure doesn't utilize the network and storage device throughout to its full extent even with enough meaningful values of setRebalanceThreadPoolSize. For instance, trying to use a common recommendation of N+1 threads (N – the number of CPU cores available) to increase rebalance speed will drammatically slowdown computation performance on demander node. This can be easily shown on the graphs below.
CPU utilization (supplier, demaner)
Image Modified
Image Modified
setRebalanceThreadPoolSize – 9; setRebalanceBatchSize – 512K; setRebalanceThreadPoolSize – 1; setRebalanceBatchSize – 512K;

Advantages of peer-2-peer balancing

...

Code Block

batches            : 150701    
rows               : 79355844  
rows per batch     : 526       

time (total)       : 5.5 min   
cache size         : 79852 MB  
rebalacne speed    : 234 MB\sec
rows per sec       : 232715 rows

+ cache rebalance total                          : 341524 ms : 100.00
+ + preload on demander                          : 306950 ms : 89.88 
+ + + offheap().invoke(..)                       : 228015 ms : 66.76 
+ + + + dataTree.invoke(..)                      : 195239 ms : 57.17 
+ + + + + BPlusTree.invokeDown(..)               : 71207 ms  : 20.85 
+ + + + + FreeList.insertDataRow(..)             : 121611 ms : 35.61 
+ + + + CacheDataStoreImpl.finishUpdate(..)      : 9988 ms   : 2.92  
+ + + ttl().addTrackedEntry(..)                  : 10523 ms  : 3.08  
+ + + continuousQueries().onEntryUpdated(..)     : 9665 ms   : 2.83  
+ message serialization                          : 1307 ms   : 0.38  
+ network delay between nodes                    : 23409 ms  : 6.85  
+ make batch on supplier handleDemandMessage(..) : 90102 ms  : 26.39

Hardware

CPU utilization
Image Added

CPU user time
Image Added

CPU io wait time
Image Added

SSD utilization
Image Added

Network utilization
Image Added

References

Zero Copy I: User-Mode Perspective – https://www.linuxjournal.com/article/6345
Example: Efficient data transfer through zero copy – https://www.ibm.com/developerworks/library/j-zerocopy/index.html
Persistent Store Overview#6.PartitionRecovery

Page tree

Versions Compared

Old Version 38

New Version 39

Key

Advantages of peer-2-peer balancing

Hardware

References

CPU utilization (supplier, demaner)
Image Modified	Image Modified
`setRebalanceThreadPoolSize – 9; setRebalanceBatchSize – 512K;`	`setRebalanceThreadPoolSize – 1; setRebalanceBatchSize – 512K;`

Page tree

Page History

Versions Compared

Old Version 38

New Version 39

Key

Advantages of peer-2-peer balancing

Hardware

References