Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The only "easy" way around these two challenges is to focus our efforts on queries that are restricted to either partitions or small token ranges. These queries behave well locally even on LCS (given levels contain token-disjoint SSTables, and assuming a low number of unleveled SSTables), avoid fan-out and all of its secondary pitfalls, and allow us to make queries at varying CLs with reasonable performance. Attempting to fix the local problems around compaction strategy could mean either restricted strategy usage or partially abandoning SSTable-attachment. Attempting to fix distributed read path problems by pushing the design towards IR systems like ES could compromise our ability to use higher read CLs.

Addendum

The following applies to the version 1 index format.. There is a version 2 index format under development.

Terminology

  • Row ID - A monotonic increasing integer associated with every row in a sstable. It’s stored in an index structure instead of key token or key offset, because it compresses better.
  • Postings/posting-list - Sorted row ids that match a given indexed value. 
  • Token file - An index of Row ID -> partition key token for every row in the sstable.
  • Offset file - An index of Row ID -> partition key offset on the data/primary-index file for every row in the sstable.
  • Segment - A smallest unit of on-disk indexing structure that is flushed during compaction to reduce memory pressure. Multiple segments of an index are written to the same physical file.

...