Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The second milestone was to support column level statistics. See Column Statistics in Hive in the Design Documents.

Supported column stats are:

BooleanColumnStatsData

DoubleColumnStatsData

LongColumnStatsData

StringColumnStatsData

BinaryColumnStatsData

DecimalColumnStatsData

Date

DateColumnStatsData

Timestamp

TimestampColumnStatsData

union ColumnStatisticsData

1: required i64 numTrues,

1: optional double lowValue,

1: optional i64 lowValue,

1: required i64 maxColLen,

1: required i64 maxColLen,

1: optional Decimal lowValue,

1: required i64 daysSinceEpoch

1: optional Date lowValue,

1: required i64 secondsSinceEpoch

1: optional Timestamp lowValue,

1: BooleanColumnStatsData booleanStats,

2: required i64 numFalses,

2: optional double highValue,

2: optional i64 highValue,

2: required double avgColLen,

2: required double avgColLen,

2: optional Decimal highValue,


2: optional Date highValue,


2: optional Timestamp highValue,

2: LongColumnStatsData longStats,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,

3: required i64 numNulls,


3: required i64 numNulls,


3: required i64 numNulls,

3: DoubleColumnStatsData doubleStats,

4: optional binary bitVectors

4: required i64 numDVs,

4: required i64 numDVs,

4: required i64 numDVs,

4: optional binary bitVectors

4: required i64 numDVs,


4: required i64 numDVs,


4: required i64 numDVs,

4: StringColumnStatsData stringStats,


5: optional binary bitVectors,

5: optional binary bitVectors,

5: optional binary bitVectors


5: optional binary bitVectors,


5: optional binary bitVectors,


5: optional binary bitVectors,

5: BinaryColumnStatsData binaryStats,


6: optional binary histogram

6: optional binary histogram



6: optional binary histogram


6: optional binary histogram


6: optional binary histogram

6: DecimalColumnStatsData decimalStats,











7: DateColumnStatsData dateStats,











8: TimestampColumnStatsData timestampStats



Info
titleVersion: Column statistics

Column level statistics were added in Hive 0.10.0 by HIVE-1362.

...

Column level top K statistics are still pending; see HIVE-3421.

Quick overview

DescriptionStored inCollected bySince
Number of partition the dataset consists ofFictional metastore property: numPartitionscomputed during displaying the properties of a partitioned tableHive 2.3
Number of files the dataset consists ofMetastore table property: numFiles Automatically during Metastore operations
 

Total size of the dataset as its seen at the filesystem levelMetastore table property: totalSize
 

Uncompressed size of the datasetMetastore table property: rawDataSize

Computed, these are the basic statistics. Calculated automatically when hive.stats.autogather is enabled.
Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS

Hive 0.8
Number of rows the dataset consist ofMetastore table property: numRows
 

Column level statistics

Metastore; TAB_COL_STATS tableComputed, Calculated automatically when hive.stats.column.autogather is enabled.
Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS
 

 



Implementation

The way the statistics are calculated is similar for both newly created and existing tables.

...

ANALYZE TABLE <table1> CACHE METADATA

Warning
titleFeature not implemented

Hive Metastore on HBase was discontinued and removed in Hive 3.0.0. See HBaseMetastoreDevelopmentGuide


When Hive metastore is configured to use HBase, this command explicitly caches file metadata in HBase metastore.  

...