You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

IDIEP-1
AuthorVladimir Ozerov Ozerov
SponsorVladimir Ozerov Ozerov
Created16 Sep 2017
StatusDRAFT


Motivation

Frequent usage pattern for Ignite is bulk data loading. Users need to be able to load data to Ignite from external sources as fast as possible. Ignite is not optimized for this use case at the moment, as bulk data loading process goes through the same code paths as normal cache updates. This IEP aims to improve bulk data loading performance.

Description

WAL optimization

When doing initial data load sometimes it is OK to relax crash-recovery guarantees. We can disable WAL for particular cache, cache group or data region, then load data, then enable it again. This mode could increase data loading time by a factor of 2x-4x.

Optimize CREATE INDEX

Secondary indexes negatively affects write performance. Common pattern is to drop indexes, load data and then create indexes again. This doesn't work for Ignite at the moment because index creation is slow. 

 

All proposed changes can be split in two groups - infrastructure improvements and index improvements. Note that some proposals are in conflict with each other so careful evaluation is a must.

Infrastructure improvements:

  • Disable WAL for some caches when doing bulk data load
  • Bypass GridCacheMapEntry
  • Add data to new pages rather to existing pages to minimize free-list overhead
  • Perform cache scan through data pages rather than primary index pages

Index improvements:

  • Bypass H2 engine when updating index
  • Fill index pages in batches
  • Build index using bottom-top approach
  • Build index from multiple threads
  • Optimize index comparisons for date/time data types

Risks and Assumptions

Binary compatibility should be preserved to allow startup with persistent data created on previous versions. Page format should either be left unchanged, or changed with ability to disable new optimizations and rollback to previous format.

Discussion Links

N/A

Reference Links

N/A

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels