Motivation
Rebalancing procedure doesn't utilize network and storage device throughput to full extent.
Description
Our current implementation has a number of issues caused by a single fundamental problem.
During rebalance process the data is sent in batches (called GridDhtPartitionSupplyMessages) but the entries in the batch are processed one by one.
So we don't take any advantage of batch processing and:
- checkpointReadLock is acquired multiple times for every entry, leading to unnecessary contention - this is clearly a bug;
- for each entry we write (and fsync, if configuration assumes it) a separate WAL record - so, if batch contains N entries, we might end up doing N fsyncs;
- adding every entry into CacheDataStore also happens completely independently. It means, we will traverse and modify each index tree N times, we will allocate space in FreeList N times and we will have to additionally store in WAL O(N*log(N)) page delta records.
The default batch size is 512KB which means thousands of key-value pairs received at once but processed individually.
We propose two step approach to fix the issue:
- Remove ineffectiveness from current implementation, avoid any unnecessary but costly operations while still handling each cache entry independently.
- Redesign rebalance process to handle entries in batches.
Risks and Assumptions
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/Rebalancing-how-to-make-it-faster-td28457.html
Reference Links
// Links to various reference documents, if applicable.
Tickets
key |
summary |
type |
created |
updated |
due |
assignee |
reporter |
priority |
status |
resolution |
JQL and issue key arguments for this macro require at least one Jira application link to be configured
|