...
ID | IEP-32 | ||||||
Author | |||||||
Sponsor |
| ||||||
Created | 06 Mar 2019 | ||||||
Status |
|
...
Let's describe the B+ Tree in more detail to understand the need for invoke operation.
The keys of the tree (hashes) are stored on the B+ Tree pages (index pages), the cache key-value itself is stored on data pages. Each item on the index page includes a link to the data page item. In general, a B+ Tree supports find, put and remove operations. For put andremove, you must first find the point of insertion/update/removal. So, cache entry update without invoke operation can look like this:
The invoke operation uses an in-place update and has the following execution scheme:
...
draw.io Diagram | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Divide the input data rows into 2 lists:
Sequentially write objects and fragments that occupy the whole page. The data page is taken from "reuse" bucket, if there is no page in reuse bucket - allocate a new one.
For remaining (regular) objects (including the remainders ("heads") of large objects) find page with enough space in FreeList (allocate new one Find the most free page with enough space for data row in FreeList ( if there is no such page - allocate new one) and fill it up to the end.
TBD: describe the implementation.
...
Overall changes to support batch updates in PageMemory can be divided into following phases.
...
...
...
...
Examine the performance difference between the following approaches and select the best:
A. single updates (current approach)
B. sort + BPlusTree.invokeAll() + FreeList.insertDataRow
C. sort + BPlusTree.findAll + FreeList.insertDataRows + BPlusTree.putAll
...
For testing purposes, a prototype was created with simplified phase Phase 1 implementation, which includes FreeList optimization (batch writing to data pages), but does not include optimization for B+ Tree (searching and inserting a range of keys). The rebalancing process was chosen as the easiest and most suitable for testing batch inserts in PageMemory.
Microbenchmark prepares a supply message and measures the time spent by the demander to handle it.
Parameters: 1 node, 1 cache, 1 partition, 100 objects, message size is not limited, 4k page.
Entry size (bytes) | 44-104 | 140-340 | 340-740 | 740-1240 | 1240-3040 | 2000-3000 | 1040-8040 | 4040-16040 (fragmented) | 100-32000 (fragmented mostly) | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Time improvement (%) | 43.4 | 37.1 | 33.9 | 28.2 | 0 | 5.4 | 10.1 | 8.6 | 1.1 |
Checked Was checked the total rebalancing time on the following configuration:
...
The improvement in rebalancing time with batch insertion is mostly noticeable when writing small objects and decreases on larger objects.
Entry size (bytes) |
---|
140- |
240 |
240- |
540 | 500-800 | 700-800 | 800-1200 |
Rebalancing time improvement (%) | 22 | 19 | 9.5 | 8 | 2 |
---|
// Links to discussions on the devlist, if applicable.
...
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|