You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

IDIEP-32
Author
Sponsor

 

Created 
Status
DRAFT


Motivation

Currently batch updates on page memory level are not implemented in Ignite. Internal structures, such as BPlusTree and FreeList, do not support them. The performance of batch operations (putAll, datastreamer, preloader) can be improved by implementing batch updates on page memory level.

Competitive analysis

Profiling the current rebalancing process shows that most of the time is spent on working with FreeList and BPlusTree (details).

Process overview

Currently, when updating the batch of cache entries in off heap, each of them is processed separately.

Such update in PageMemory can be divided into the following steps:

  1. Search for key in B+ tree.
  2. Store entry key-value (data row) into data page.
  3. Insert/Update key in B+ tree page.

In Ignite search and update of key in tree implemented through the "invoke" operation, to understand the needs for this operation, it is necessary to describe B+ tree in more detail.

The keys of the tree (hashes) are stored on the B+ tree pages (index pages), the cache key-value (data row) itself is stored on data pages. Each item on the index page includes a link to the data page item (pageId + itemId).

In general, a B-tree supports find, put and remove operations. For put/remove, you must firstly find the point of insertion/update/removal.

So, cache entry update without invoke operation should look like this:

  • Search B tree for link to old data row (find).
  • The old value does not differ in length - a simple value update of data row on the data page.
  • The old value differs in length - the link to it changes:
    • Store new key-value into data page.
    • Put B tree key (with find) to update link to data page item.
    • Remove old key-value from data page.

To prevent a redundant secondary search in the case described above the "invoke" operation was introduced. This operation has the following scheme of execution:

OffHeapManager.invoke
                |-> BPlusTree.invoke
                |               |
                |               |
                |               |
                |               |-> Invoke.invokeClosure()
                |               |           |-> UpdateClosure.call(foundRow)
                |               |           |                   |- compare entry version
                |               |           |                   |- newRow = create newRow on data page
                |               |           |                   |- treeOp = PUT/REMOVE/NOOP
                |               |           |                  
                |               |           \-> read treeOp -> create operation closure (Put(newRow)/Remove(row))
                |               |
                |               \-> Invoke.found()/notFound() -> exec opearation closure
                |
                |
                \-> finishUpdate/finishRemove

Saving a cache entry on data page consists of the following operations:

  1. Remove a page from FreeList, which has enough storage space (if there is no such page - acquire a new one).
  2. Lock page.
  3. Write cache entry data.
  4. Update page counters.
  5. Unlock page.
  6. Based on the remaining free space insert page into FreeList.

Proposed changes

Implementation should consists of two major related improvements:

  1. Batch writing to data pages.
  2. Batch updates in BPlusTree.

Batch writing to data pages.

Improves the time to remove/insert pages in FreeList, pages locking and update item counters.

Batch updates in BPlusTree.

Improves the time of searching and inserting a range of keys.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

key summary type updated assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels