Page History

...

Number of Rows
Number of files
Size in Bytes.
For tables, the same statistics are supported with the addition of the number of partitions of the table.

Column level top K values can also be gathered a long with partition level statistics. See Top K Statistics.

Implementation

The way the statistics are calculated is similar for both newly created and existing tables.
For newly created tables, the job that creates a new table is a MapReduce job. During the creation, every mapper while copying the rows from the source table in the FileSink operator, gathers statistics for the rows it encounters and publishes them into a Database (possibly MySQL). At the end of the MapReduce job, published statistics are aggregated and stored in the MetaStore.
A similar process happens in the case of already existing tables, where a Map-only job is created and every mapper while processing the table in the TableScan operator, gathers statistics for the rows it encounters and the same process continues.

...

Space shortcuts

Child pages

Versions Compared

Old Version 8

New Version 9

Key

Implementation