Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: ORC indexes not used for queries (Prasanth, user@hive Feb. 24 2014)

...

Index data includes min and max values for each column and the row positions within each column. (A bit field or bloom filter could also be included.) Row index entries provide offsets that enable eeking seeking to the right compression block and byte within a decompressed block.  Note that ORC indexes are used only for the selection of stripes and row groups and not for answering queries.

Having relatively frequent row index entries enables row-skipping within a stripe for rapid reads, despite large stripe sizes. By default every 10,000 rows can be skipped.

...