Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated ORC spec with bloom filter indexes and hasNull flag information

...

Key

Default

Notes

orc.compress

ZLIB

high level compression (one of NONE, ZLIB, SNAPPY)

orc.compress.size

262,144

number of bytes in each compression chunk

orc.stripe.size

268435456

number of bytes in each stripe

orc.row.index.stride

10,000

number of rows between index entries (must be >= 1000)

orc.create.index

true

whether to create row indexes

orc.bloom.filter.columns""comma separated list of column names for which bloom filter should be created
orc.bloom.filter.fpp0.05false positive probability for bloom filter (must >0.0 and <1.0)

For example, creating an ORC stored table without compression:

...

The goal of the column statistics is that for each column, the writer
records the count and depending on the type other useful fields. For
most of the primitive types, it records the minimum and maximum
values; and for numeric types it additionally stores the sum.
From Hive 1.2.0 onwards, the column statistics will also record if
there are any null value within the row group by setting hasNull flag.
hasNull flag is used by ORC's predicate pushdown to better answer
'IS NULL' queries.

message ColumnStatistics {
// the number of values
optional message ColumnStatistics {
// the number of values
optional uint64 numberOfValues = 1;
// At most one of these has a value for any column
optional IntegerStatistics intStatistics = 2;
optional DoubleStatistics doubleStatistics = 3;
optional StringStatistics stringStatistics = 4;
optional BucketStatistics bucketStatistics = 5;
optional DecimalStatistics decimalStatistics = 6;
optional DateStatistics dateStatistics = 7;
optional BinaryStatistics binaryStatistics = 8;
optional TimestampStatistics timestampStatistics = 9;
optional bool hasNull = 10;
}

For integer types (tinyint, smallint, int, bigint), the column
statistics includes the minimum, maximum, and sum. If the sum
overflows long at any point during the calculation, no sum is
recorded.

...

EncodingStream KindOptionalContents
DIRECTPRESENTYesBoolean RLE
 DATANoByte RLE

Indexes

Row Group Index

The row group indexes consist of a ROW\_INDEX stream for each primitive
column that has an entry for each row group. Row groups are controlled
by the writer and default to 10,000 rows. Each RowIndexEntry gives the
position of each stream for the column and the statistics for that row
group.

...

Because dictionaries are accessed randomly, there is not a position to
record for the dictionary and the entire dictionary must be read even
if only part of a stripe is being read.

Bloom Filter Index

Info
titleVersion 1.2.0+: Bloom Filter

Bloom Filters are added to ORC indexes from Hive 1.2.0 onwards. Predicate pushdown can make use bloom filters to better prune the row groups that do not satisfy the filter condition.

The bloom filter indexes consist of a BLOOM_FILTER stream for each
columns specified through 'orc.bloom.filter.columns' table properties.
BLOOM_FILTER stream records a bloom filter entry for each row group
(default to 10,000 rows) in a column. In the presence of bloom filter
stream, predicate pushdown in ORC will make use of bloom filter indexes
instead of min/max stats from row group indexes.

Each BloomFilterEntry stores the number of hash functions ('k') used and
the bitset backing the bloom filter. The bitset is serialized as repeated
longs from which number of bits ('m') for the bloom filter can be derived.
m = bitset.length * 64.

message BloomFilter {
optional uint32 numHashFunctions = 1;
repeated fixed64 bitset = 2;
}
message BloomFilterIndex {
repeated BloomFilter bloomFilter = 1;
}

Bloom filter internally uses two different hash functions to map a key
to a position in the bit set. For tinyint,smallint,int,bigint,float
and double types, Thomas Wang's 64-bit integer hash function is used.
Floats are converted to IEEE-754 32 bit representation
(using Java's Float.floatToIntBits(float)). Similary, Doubles are
converted to IEEE-754 64 bit representation (using Java's
Double.doubleToLongBits(double)). All these primitive types
are casted to long base type before passing it onto the hash function.
For strings and binary types, Murmur3 64 bit hash algorithm is used.
The 64 bit variant of Murmur3 considers only the most significant
8 bytes of Murmur3 128-bit algorithm. The 64 bit hashcode generated
from the above algorithms is used as a base to derive 'k' different
hash function. We use the idea mentioned in the paper "Less Hashing,
Same Performance: Building a Better Bloom Filter" by Kirsch et.al to
quickly compute the k hashcodes.

The algorithm for computing k hashcodes and setting the bit position
in bloom filter is as follows
1) Get 64 bit base hash code from Murmur3 or Thomas Wang's hash algorithm
2) Split the above hashcode into two 32-bit hashcodes (say hash1 and hash2)
3) k'th hashcode is obtained by (where k > 0)
combinedHash = hash1 + (k * hash2)
4) If combinedHash is negative flip all the bits
combinedHash = ~combinedHash
5) Bit set position is obtained by performing modulo with m
position = combinedHash % m
6) Set the position in bit set. The LSB 6 bits identifies the long index
within bitset and bit position within the long uses little endian order.
bitset[(int) (position >>> 6)] |= (1L << position);

Bloom filter streams are interlaced with row group indexes. This placement
makes it convenient to read the bloom filter stream and row index stream
together in single read operation.

Image Added