Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The second milestone was to support column level statistics. See Column Statistics in Hive in the Design Documents.

Supported column stats are:

BooleanColumnStatsData

DoubleColumnStatsData

LongColumnStatsData

StringColumnStatsData

BinaryColumnStatsData

DecimalColumnStatsData

Date

DateColumnStatsData

Timestamp

TimestampColumnStatsData

union ColumnStatisticsData

1: required i64 numTrues,

1: optional double lowValue,

1: optional i64 lowValue,

1: required i64 maxColLen,

1: required i64 maxColLen,

1: optional Decimal lowValue,

1: required i64 daysSinceEpoch

1: optional Date lowValue,

1: required i64 secondsSinceEpoch

1: optional Timestamp lowValue,

1: BooleanColumnStatsData booleanStats,

2: required i64 numFalses,

2: optional double highValue,

2: optional i64 highValue,

2: required double avgColLen,

2: required double avgColLen,

2: optional Decimal highValue,


2: optional Date highValue,


2: optional Timestamp highValue,

2: LongColumnStatsData longStats,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,


3: required i64 numNulls,


3: required i64 numNulls,

3: DoubleColumnStatsData doubleStats,

4: optional binary bitVectors

4: required i64 numDVs,

4: required i64 numDVs,

4: required i64 numDVs,

4: optional binary bitVectors

4: required i64 numDVs,


4: required i64 numDVs,


4: required i64 numDVs,

4: StringColumnStatsData stringStats,


5: optional binary bitVectors,

5: optional binary bitVectors,

5: optional binary bitVectors


5: optional binary bitVectors,


5: optional binary bitVectors,


5: optional binary bitVectors,

5: BinaryColumnStatsData binaryStats,


6: optional binary histogram

6: optional binary histogram



6: optional binary histogram


6: optional binary histogram


6: optional binary histogram

6: DecimalColumnStatsData decimalStats,











7: DateColumnStatsData dateStats,











8: TimestampColumnStatsData timestampStats



Info
titleVersion: Column statistics

Column level statistics were added in Hive 0.10.0 by HIVE-1362.

...

Column level top K statistics are still pending; see HIVE-3421.

Quick overview

DescriptionStored inCollected bySince
Number of partition the dataset consists ofFictional metastore property: numPartitionscomputed during displaying the properties of a partitioned tableHive 2.3
Number of files the dataset consists ofMetastore table property: numFiles Automatically during Metastore operations
 

Total size of the dataset as its seen at the filesystem levelMetastore table property: totalSize
 

Uncompressed size of the datasetMetastore table property: rawDataSize

Computed, these are the basic statistics. Calculated automatically when hive.stats.autogather is enabled.
Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS

Hive 0.8
Number of rows the dataset consist ofMetastore table property: numRows
 

Column level statistics

Metastore; TAB_COL_STATS tableComputed, Calculated automatically when hive.stats.column.autogather is enabled.
Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS
 

...



Implementation

The way the statistics are calculated is similar for both newly created and existing tables.

...