Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Let's describe the B+ Tree in more detail to understand the need for invoke operation.

The keys of the tree (hashes) are stored on the B+ Tree pages (index pages), the cache key-value itself is stored on data pages. Each item on the index page includes a link to the data page item. In general, a B+ Tree supports find, put and remove operations. For put andremove, you must first find the point of insertion/update/removal. So, cache entry update without invoke operation can look like this:

  • Search B+ Tree for link to old key-value (find)
  • The old value does not differ in length - a simple value update of key-value on the data page
  • The old value differs in length - the link to it changes:
    • Store a new key-value into data page
    • Put B+ Tree key (with "secondary" find) to update link to data page item
    • Remove old key-value from data page

The invoke operation uses an in-place update and has the following execution scheme:

...

  1. Batch writing to data pages
  2. Batch updates in B+ Tree

...

TBD

Batch

...

writing to data pages

Divide the input data rows into 2 lists:

...

Sequentially write objects and fragments that occupy the whole page. The data page is taken from "reuse" bucket, if there is no page in reuse bucket - allocate a new one.

For remaining (regular) objects (including the remainders ("heads") of large objects) , find the most free page with enough space in FreeList (allocate new one if there is no such page) and fill it up to the end.

...

  • Implement insertDataRows operation in FreeList - insert several data rows at once.
  • Preloader should insert a batch of data rows before initializing cache entries and should remove additional rows in case of fallback (if cache entry was not initialized). In the case when the cache entry is initialized incorrectly, preloader should rollback changes and remove pre-created data row.

Phase 2: DataStreamer support

...

  • Add support for MVCC (TRANSACTIONAL_SNAPSHOT) cache mode.

Risks and Assumptions

  1. In-memory eviction policy can be configured in such a way that will lead to OOM when using batch writing to data pages, so in some degenerate cases the batch writing must be disabledFor BPlusTree batch operations, ordered keys are required, moreover, an attempt to simultaneously lock the same keys in a different order lead to a deadlock, so batch insertion into the page memory must be performed on an unlocked entries. Alternatively, keys passed in batches from different components (preloader, datastreamer, putAll) should be locked in the same order.
  2. Heap usage/GC pressure.

Prototype testing results

...

Jira
serverASF JIRA
columnskey,summary,type,updated,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (iep-32) order by key
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
The improvement in rebalancing time when using batch insertion is mostly noticeable when writing small objects and decreases with increasing object size.the improvement in total rebalancing time is reduced in cases of demander idle waiting for the next messages from the supplier