Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Row ID - A monotonic increasing integer associated with every row in a sstable. It’s stored in an index structure instead of key token or key offset, because it compresses better.
  • Postings/posting-list - Sorted row ids that match a given indexed value. 
  • Token file - An index of Row ID -> partition key token for every row in the sstable.
  • Primary key - A partition key and clustering representing a row in a SSTable
  • Primary key store - A bi-directional block store allowing Row ID → Primary Key and Primary Key → Row ID lookupsOffset file - An index of Row ID -> partition key offset on the data/primary-index file for every row in the sstable.
  • Segment - A smallest unit of on-disk indexing structure that is flushed during compaction to reduce memory pressure. Multiple segments of an index are written to the same physical file.

SAI is optimised for storage. Tokens and offsets are stored The primary key store is written once per SSTable.  Column indexes access the token and offset files primary key store using a row ID. Offsets are compressed using Frame of Reference (FoR) encoding while tokens are not because tokens consume the full 8 bytes and therefore cannot be compressedThe primary key store uses an on-disk trie containing primary keys to do the primary key to row id lookups and a prefix-compressed block store for row id to primary key lookups.

Index implementations need only store an integer row ID in their postings list. Row IDs are translated to decorated a primary key via the token/offset files and SSTableReader#keyAtprimary key store.

As multiple indexes share the token/offset primary key store files, it becomes feasible to index many columns on the same table without significantly increasing the index size.

...