Page History

...

Code Block

// Hive version 0.11 through 0.14:
hive --orcfiledump <location-of-orc-file>
 
// Hive version 0.15 and later:
hive --orcfiledump [-d] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.2.0 and later:
hive --orcfiledump [-d] [-t] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex <col_ids>] [--recover] [--skip-dump] 
    [--backup-path <new-path>] <location-of-orc-file-or-directory>

Specifying -d to in the command will cause it to dump the data in the ORC file data rather than the metadata (Hive 1.1.0 and later).

Specifying --rowindex with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).

Specifying -t to in the command will print the timezone id of the writer.

Specifying -j to in the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata, add -p to the command.

Specifying --recover to in the command will recover a corrupted orc ORC file generated by hive Hive streaming.

Specifying --skip-dump is used along along with --recover to perform will perform recovery without dumping metadata.

Specifying --backup-path to the command with new with a new-path will let the recovery tool to move the corrupted files to the specified backup path (default: /tmp).

<location-of-orc-file> is the URI of the ORC file.

...

Space shortcuts

Child pages

Versions Compared

Old Version 28

New Version 29

Key