By rebalancing the Apache Ignite cluster, it balances the distribution of primary and backup data copies according to applied affinity function on the new set of peer nodes. Imbalanced data increases the likelihood of data loss and can affect peer utilization during data requests. On the other hand, a balanced set of data copies optimizes each Ignite peer's requests load and each Ignite peer's disk resources consumption.
Currently, there are two types of the Apache Ignite cluster rebalancing:
Regardless of which rebalance mode is used SYNC
or ASYNC
(defined in CacheRebalanceMode
enum), the Apache Ignite rebalance implementation has a number of limitations caused by a memory-centric desing architecture:
GridDhtPartitionSupplyMessage
used) it still processes entries one by one. Such approach have the low impact with a pure in-memory use case but it leads to additional fsync's and logging WAL records with the Apache Ignite native persistence enabled. setRebalanceThreadPoolSize
is set to 1
and setRebalanceBatchSize
to 512K
which means that thousands of key-value pairs will be processed single-thread and individually. Such approach impacts on: CacheDataStore
will traverse and modify each index tree N-times. It will allocate the space N-times within FreeList
structure and have to additionally store WAL page delta records with approximate complexity ~ O(N*log(N))
;Rebalancing procedure doesn't utilize the network and storage device throughput to full extent even with enough meaningful values of setRebalanceThreadPoolSize
. For instance, trying to use a common recommendation of N+1
threads (N
– the number of CPU cores available) to increase rebalance speed will drammatically slowdown computation performance on demander nodes. This can be easily shown on the graphs below.
CPU utilization (supplier, demaner) | |
---|---|
setRebalanceThreadPoolSize = 9
setRebalanceBatchSize = 512K | setRebalanceThreadPoolSize = 1
setRebalanceBatchSize = 512K |
Apache Ignite needs to support peer-2-peer cache partition transfer with zero-copy algorithm based on extension of communication SPI.
/** * @return {@code True} if new type of direct connections supported. */ public default boolean pipeConnectionSupported() { return false; } /** * @param src Source cluster node to initiate connection with. * @return Channel to listen. * @throws IgniteSpiException If fails. */ public default ReadableByteChannel getRemotePipe(ClusterNode src) throws IgniteSpiException { throw new UnsupportedOperationException(); } /** * @param dest Destination cluster node to communicate with. * @param out Channel to write data. * @throws IgniteSpiException If fails. */ public default void sendOnPipe(ClusterNode dest, WritableByteChannel out) throws IgniteSpiException { throw new UnsupportedOperationException(); }