Motivation

For some datasets and applications (like Cloudberry), it is desirable to have the property that all disk components of the primary index and all secondary indexes of a dataset align on the same filter value boundaries. The benefit is that when a tuple is found at some component di of the secondary index, we can directly search the corresponding component di' of the primary index to fetch that tuple without checking other disk components.

Current Workflow of Merging

Currently, the workflow of merging disk components is as follows. Whenever a new disk component is added for an index (due to flush or merge), the corresponding merge policy would be notified. The merge policy checks the existing disk components for an index, and if it decides

Page tree

LSM Component Alignment Design Doc

Motivation

Current Workflow of Merging