Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add line break in 1.3.0 orcfiledump syntax, minor edits to syntax explanation

...

Code Block
// Hive version 0.11 through 0.14:
hive --orcfiledump <location-of-orc-file>
 
// Hive version 0.15 and later:
hive --orcfiledump [-d] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.2.0 and later:
hive --orcfiledump [-d] [-t] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex <col_ids>] [--recover] [--skip-dump] 
    [--backup-path <new-path>] <location-of-orc-file-or-directory>

Specifying -d to  in the command will cause it to dump the data in the ORC file data rather than the metadata (Hive 1.1.0 and later).

Specifying --rowindex with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).

Specifying -t to  in the command will print the timezone id of the writer.

Specifying -j to  in the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata, add -p to the command.

Specifying --recover to  in the command will recover a corrupted orc ORC file generated by hive Hive streaming.

Specifying --skip-dump is used along  along with --recover to perform  will perform recovery without dumping metadata.

Specifying --backup-path to the command with new  with a new-path will let the recovery tool to move the corrupted files to the specified backup path (default: /tmp).

<location-of-orc-file> is the URI of the ORC file.

...