Motivation

Currently, when we process a query using secondary indexes, we first search secondary indexes to get matching primary keys (pks), sort them, and perform primary key lookups to fetch records. During primary index lookups, each fetch pk is checked against all components of the primary index without being filtered. When the primary index contains a large number of disk components, primary index lookups would take a lot of time. However, note the fact that all indexes of a dataset (and partition) are all flushed together, which means there is some relationship among components of secondary indexes and components of the primary index. Such relationship can be exploited to acceleration primary index lookups after secondary index scans, which is achieved through component Ids. The basic idea is that components of all indexes are correlated through component Ids. In addition to matched pks, the secondary index also return the Id of the component where the matching pk is found. Thus, when performing primary index lookups, we only need to search a subset of components of the primary index based on the input Id, which greatly reduces the time of primary index lookups, and thus the total time of query processing.

Component Id Management

Exploit Component Id for Query Processing

Page tree

Component Id-based secondary-to-primary index acceleration

Motivation

Component Id Management

Exploit Component Id for Query Processing