Page History

...

In addition to the partition statistics, column level top K values can also be estimated for Hive tables.
The name and top K values of the most skewed column is stored in the partition or non-partitioned table’s skewed information, if user did not specify skew. This works for both newly created and existing tables.
The algorithm for computing top K is based on this paper: top-k.

...

Code Block
... /** * This method aggregates top K statistics. * * */ public List<String> aggregateStatsTopK(String keyPrefix, String statType); ...

Usage

Top K statistic is statistics are not computed enabled by default. The user can set the boolean variable hive.stats.topk.collect to be true to enable computing top K and putting top K into skewed information.

...

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version 2

Key

Usage