Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Top K statistics are gathered along with partition level statistics. The interface IStatsAggregator needs to add a method aggregateStatsTopK() that reads multiple entries from the temporary storage:

Code Block
...

public interface IStatsAggregator {

...

  /**
 * This method aggregates top K statistics.
   *
 * */
  public List<String> aggregateStatsTopK(String keyPrefix, String statType);

...

}

Usage

Top K statistics are not enabled by default. The user can set the boolean variable hive.stats.topk.collect to be true to enable computing top K and putting top K into skewed information.

...