ID	IEP-16
Author	Ilya Lantukh
Sponsor	Ilya Lantukh Anton Vinogradov
Created	Mar 28 2018
Status	ACTIVE

Motivation

Rebalancing procedure doesn't utilize network and storage device throughput to full extent.

Description

Our current implementation has a number of issues caused by a single fundamental problem.

During rebalance process the data is sent in batches (called GridDhtPartitionSupplyMessages) but the entries in the batch are processed one by one.

So we don't take any advantage of batch processing and:

- checkpointReadLock is acquired multiple times for every entry, leading to unnecessary contention - this is clearly a bug;

- for each entry we write (and fsync, if configuration assumes it) a separate WAL record - so, if batch contains N entries, we might end up doing N fsyncs;

- adding every entry into CacheDataStore also happens completely independently. It means, we will traverse and modify each index tree N times, we will allocate space in FreeList N times and we will have to additionally store in WAL O(N*log(N)) page delta records.

The default batch size is 512KB which means thousands of key-value pairs received at once but processed individually.

We propose two step approach to fix the issue:

Remove ineffectiveness from current implementation, avoid any unnecessary but costly operations while still handling each cache entry independently.
Redesign rebalance process to handle entries in batches.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Rebalancing-how-to-make-it-faster-td28457.html

Reference Links

// Links to various reference documents, if applicable.

Tickets

key	summary	type	created	updated	due	assignee	reporter	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-16: Optimization of rebalancing