Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The bloom filter indexes consist of a BLOOM_FILTER stream for each
columns specified through 'orc.bloom.filter.columns' table properties.
BLOOM_FILTER stream records a bloom filter entry for each row group
(default to 10,000 rows) in a column. If bloom filters are present
for a column, ORC predicate pushdown will evaluate the predicate
against bloom filter after evaluating with min/max statistics from
row indexesOnly the row groups that
qualifies min/max row index evaluation will be evaluated against
bloom filter index.

Each BloomFilterEntry stores the number of hash functions ('k') used and
the bitset backing the bloom filter. The bitset is serialized as repeated
longs from which number of bits ('m') for the bloom filter can be derived.
m = bitset.length * 64.

...