...
When doing initial data load sometimes it is OK to relax crash-recovery guarantees. We can disable WAL for particular cache, cache group or data region, then load data, then enable it again. This mode could increase data loading time by a factor of 2x-4x.
Secondary indexes negatively affects write performance. Common pattern is to drop indexes, load data and then create indexes again. This doesn't work for Ignite at the moment because index creation is slow.
All proposed changes can be split in two groups - infrastructure improvements and index improvements. Note that some proposals are in conflict with each other so careful evaluation is a must.
Infrastructure improvements:
First, we create index adding entries one-by-one. Every addition require walking through B+Tree from the top. Instead, we can create sorted batches of entries and add multiple entries to index in one hop. Second, index is created through iteration over primary index. This is less then efficient, especially for persistent caches, due to additional jumps from primary index to data page. Instead, we can try iterating through data pages, rather than through primary index. Last, we can try creating index from multiple threads, when every thread will process predefined set of partitions.
Data streamer is the main tool for fast data load to Ignite. Currently it is not very efficient because every call to {{igniteDataStreamer.addData(K, V)}} method require a lot of actions. As a result we cannot load data to data streamer from one thread fast enough, user should create many threads to mitigate this. We should optimize this and make data streamer fast out of the box.
Index improvements:
Binary compatibility should be preserved to allow startup with persistent data created on previous versions. Page format should either be left unchanged, or changed with ability to disable new optimizations and rollback to previous format.
...