Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: edits and links to CREATE TABLE and CLI (thanks go to Sanjay Subramanian for this wiki)

...

No Format
hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1  datatype_1......column_N datatype_N) 
         PARTITIONED BY (partition_col_1 datatype_1 ....col_P  datatype_P) 
         ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
         STORED AS INPUTFORMAT  \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"   
                   OUTPUTFORMAT \"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\ ";"

Note: The double quotes have to be escaped so that the 'hive -e' command works correctly.

See CREATE TABLE and Hive CLI for information about command syntax.

Hive Queries

Option 1: Directly Create LZO Files

  1. Directly create LZO files as the output of the Hive query.
  2. Use lzop command utility or your custom Java to generate .lzo.index for the .lzo files.

...

No Format
hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true;<query-string>"

Option 2: Write Custom Java to Create LZO Files

  1. Create text files as the output of the Hive query.
  2. Write custom Java code to
    1. convert Hive query generated text files to .lzo files
    2. generate .lzo.index files for the .lzo files generated above

...