Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Profiling the current rebalancing process shows that most of the time spent on working with FreeList and BPlusTree B+ Tree (details).

Process overview

Currently, when updating the batch of cache entries in off-heap, each of them is processed separately. Such update in PageMemory can be divided into consists of the following steps:

  1. Search for key in B+ tree.Tree
  2. Store entry key-value (to data row) into data page.
  3. Insert/update key in B+ tree page.

To prevent a redundant secondary search in B+ Tree the invoke operation was introduced.

Invoke in B+ Tree

Let's describe the B+ Tree in more detail to understand the need for invoke operationSearch and update of key in tree implemented through the invoke operation, to understand the needs for this operation, it is necessary to describe B+ tree in more detail.

The keys of the tree (hashes) are stored on the B+ tree Tree pages (index pages), the cache key-value (data row) itself is stored on data pages. Each item on the index page includes a link to the data page item. In general, a B+ tree Tree supports find, put and remove operations. For put andremove, you must first find the point of insertion/update/removal. So, cache entry update without invoke operation should look like this:

  • Search B+ tree Tree for link to old data row old key-value (find).
  • The old value does not differ in length - a simple value update of data row of key-value on the data page.
  • The old value differs in length - the link to it changes:
    • Store new data row key-value into data page.
    • Put B+ tree Tree key (with "secondary" find) to update link to data page item.
    • Remove old data row old key-value from data page.

The invoke To prevent a redundant secondary search in the case described above the invoke operation was introduced. This operation has the following execution scheme of execution:

Code Block
OffHeapManager.invoke
                |-> BPlusTree.invoke
                |               |
                |               |
                |               |
                |               |-> Invoke.invokeClosure()
                |               |           |-> UpdateClosure.call(foundRow)
                |               |           |                   |- compare entry version
                |               |           |                   |- newRow = create newRow on data page
                |               |           |                   |- treeOp = PUT/REMOVE/NOOP
                |               |           |                  
                |               |           \-> read treeOp -> create operation closure (Put(newRow)/Remove(row))
                |               |
                |               \-> Invoke.found()/notFound() -> exec operation closure
                |
                |
                \-> finishUpdate/finishRemove

Store key-value

Saving a data row key-value on data page consists of the following operations:

  1. Remove a page from FreeList, which has enough storage space (if there is no such page - allocate a new one).
  2. Lock page.
  3. Write cache entry data.
  4. Update page counters.
  5. Unlock page.
  6. Based on the remaining free space insert page into FreeListto FreeList

Advantages

The batch will improve the average insertion time by reducing count of FreeList actions on each data row (insert/remove pages, locking and counters updates) and the time of searching and inserting a range of keys in B+ Tree.

Proposed changes

Implementation should consists of two major related improvements:

  1. Batch writing to data pages
  2. Batch updates in B+ Tree

Diagram overview

draw.io Diagram
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNamebatch updates
simpleViewerfalse
width600
diagramWidth1061
revision5
Implementation should consists of two major related improvements:

Batch

...

write to data pages

...

Batch writing to data pages.

Find the most free page with enough space for data row in FreeList (if there is no such page - allocate new one) and fill it up to the end.

Improves average insertion time by reducing count of FreeList actions on each data row (insert/remove pages, locking and counters updates).

Batch updates in BPlusTree.

Batch update in B+ Tree

TBD: describe the implementationImproves the time of searching and inserting a range of keys.

Proposed plan

Overall changes to support batch updates in PageMemory can be divided into following phases.

Phase 1: Batch insert new keys

...

  • Implement insertDataRows operation in FreeList - insert several data rows at once.
  • Implement invokeAll operation in BPlusTree: support searching and inserting range of keys.
  • Enable batch insertion in preloader (enabled by a special system property).

Phase 2: Batch update existing keys

...

  • InvokeAll operation should support batch update of existing keys.

Phase 3: DataStreamer support

...

  • Add support for batch updates in DataStreamer (isolated updater).

Phase 4: putAll support

...

  • Add support for batch updates in IgniteCache putAll.

Phase 5: MVCC support

...

  • Add support for MVCC (TRANSACTIONAL_SNAPSHOT) cache mode.

...