Overview

By rebalancing the Apache Ignite cluster, it balances the distribution of primary and backup data copies according to applied affinity function on the new set of peer nodes. Imbalanced data increases the likelihood of data loss and can affect peer utilization during data requests. On the other hand, a balanced set of data copies optimizes each Ignite peer's requests load and each Ignite peer's disk resources consumption.

Currently, there are two types of the Apache Ignite cluster rebalancing:

In-memory rebalancing;
Historical rebalancing (native presistence enabled);

Limitation of data balancing

Regardless of which rebalance mode is used SYNC or ASYNC (defined in CacheRebalanceMode enum), the Apache Ignite rebalance implementation has a number of limitations caused by a memory-centric desing architecture:

Although all cache data is sent between peer's in batches (GridDhtPartitionSupplyMessage used) it still processes entries one by one. Such approach have the low impact with a pure in-memory Apache Ignite use case but it leads to additional fsync's and logging WAL records with the native persistence enabled.

By default, setRebalanceThreadPoolSize is set to 1 and setRebalanceBatchSize to 512K which means that thousands of key-value pairs will be processed single-thread and individually. In addition, this also impacts on:
- Extra unnecessary chages to keep of Ignite node data structures up to date. Adding each entry record into CacheDataStore will traverse and modify each index tree N-times. We will allocate space N-times within FreeList and have to additionally store WAL page delta records ~ O(N*log(N));
- Batch with N-entries will produce N WAL reocrds which might end up with N fsyncs (fsync-wal configuration);
- Increased the chance of JVM pauses. The more attendant objects we produces by applying changes, the more GC happens and the greater chance of JVM pauses we have;
Rebalancing procedure doesn't utilize the network and storage device throughput to full extent even with sufficiently large values of setRebalanceThreadPoolSize parameter.

Design

Objective

Streaming via CommunicationSpi

Handshake message

Extended CommunicationSpi

TCP connection listener for TcpCommunicationSpi

p2p-type connection support by GridNioServer

Rebalance checkpointing on supplier

Recovery from temporary WAL on demander

Questions

How many streaming connetions will be supported by CommunicationSpi?
Single connection between pair of nodes.
Will bandwidth of CommunicationSpi connection be controlled at runtime?
Yes.
Create Standalone
How to choose the best rebalance type on the new node joined topology?
ASYNC and SYNC cache rebalacing via p2p?

Page tree

Rebalance p2p