Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The bloom filter indexes consist of a BLOOM_FILTER stream for each
columns specified through 'orc.bloom.filter.columns' table properties.
BLOOM_FILTER stream records a bloom filter entry for each row group
(default to 10,000 rows) in a column. In the presence of bloom filter
stream, predicate pushdown in ORC will make use of bloom filter indexes
instead of min/max stats from row group If bloom filters are present
for a column, ORC predicate pushdown will evaluate the predicate
against bloom filter after evaluating with min/max statistics from
row indexes.

Each BloomFilterEntry stores the number of hash functions ('k') used and
the bitset backing the bloom filter. The bitset is serialized as repeated
longs from which number of bits ('m') for the bloom filter can be derived.
m = bitset.length * 64.

...