Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleVersion information

Column statistics are introduced in Hive 0.10.0 by HIVE-1362. This is the design document.

Column statistics auto gather is introduced in Hive 2.3 by HIVE-11160. This is also the design document.

...

Please note that table and column aliases are not supported in the analyze statement.

To view column stats :

describe formatted [table_name] [column_name];

Metastore Schema

To persist column level statistics, we propose to add the following new tables,

...

LOW_VALUE RAW,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,

BIT_VECTOR, VARCHAR(16400)BLOB,  /* introduced in HIVE-16997 in Hive 3.0.0 */

AVG_COL_LEN DOUBLE,
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)

...

LOW_VALUE RAW,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,

BIT_VECTOR, VARCHAR(16400)BLOB,  /* introduced in HIVE-16997 in Hive 3.0.0 */

AVG_COL_LEN DOUBLE,
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)

...