Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add the FOR COLUMNS option to ANALYZE (Hive-1362)

...

Code Block
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]
  COMPUTE STATISTICSSTATISTICS 
  [FOR COLUMNS]          -- (Note: Hive 0.10.0 and later.)
  [NOSCAN];

When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any). If certain partition specs are specified, then statistics are gathered for only those partitions. When computing statistics across all partitions, the partition columns still need to be listed.

When the optional parameter NOSCAN is specified, the command won't scan files so that it's supposed to be fast. Instead of all statistics, it just gathers the following statistics:

  • Number of files
  • Physical size in bytes
Info
titleVersion 0.10.0: FOR COLUMNS

As of Hive 0.10.0, the optional parameter FOR COLUMNS computes column statistics for all columns in the specified table (and for all partitions if the table is partitioned). To display these statistics, use DESCRIBE FORMATTED [db_name.]table_name.column_name [PARTITION (partition_spec)].

Examples

Suppose table Table1 has 4 partitions with the following specs:

...