ID	IEP-16
Author	Ilya Lantukh
Sponsor
Created	Mar 28 2018
Status	DRAFT

Motivation

Rebalancing procedure doesn't utilize network and storage device throughput to full extent.

Description

Our current implementation has a number of issues caused by a single fundamental problem.

During rebalance process the data is sent in batches (called GridDhtPartitionSupplyMessages) but the entries in the batch are processed one by one.

So we don't take any advantage of batch processing and:

- checkpointReadLock is acquired multiple times for every entry, leading to unnecessary contention - this is clearly a bug;

- for each entry we write (and fsync, if configuration assumes it) a separate WAL record - so, if batch contains N entries, we might end up doing N fsyncs;

- adding every entry into CacheDataStore also happens completely independently. It means, we will traverse and modify each index tree N times, we will allocate space in FreeList N times and we will have to additionally store in WAL O(N*log(N)) page delta records.

The default batch size is 512KB which means thousands of key-value pairs received at once but processed individually.

We propose two step approach to fix the issue:

Remove ineffectiveness from current implementation, avoid any unnecessary but costly operations while still handling each cache entry independently.
Redesign rebalance process to handle entries in batches.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Rebalancing-how-to-make-it-faster-td28457.html

Reference Links

// Links to various reference documents, if applicable.

Tickets

Ways to reduce impact of independent processing:
- https://issues.apache.org/jira/browse/IGNITE-8019 - aforementioned bug, causing contention on checkpointReadLock;
- https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency in GridCacheMapEntry implementation;
- https://issues.apache.org/jira/browse/IGNITE-8017 - automatically disable WAL during preloading.

Ways to solve problem on more global level:

- https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to introduce batch modification;
- https://issues.apache.org/jira/browse/IGNITE-8020 - complete redesign of rebalancing process for persistent caches, based on file transfer.

Page tree

IEP-16: Optimization of rebalancing