LZO Compression

Table of Contents

General LZO Concepts

LZO is a lossless data compression library that favors speed over compression ratio. See http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information about LZO and see Compressed Data Storage for information about compression in Hive.

Imagine a simple data file that has three columns

...

Let's populate a data file containing 4 records:

No Format
19630001 john lennon 19630002 paul mccartney 19630003 george harrison 19630004 ringo starr

...

Next we run the command to create an LZO index file:

No Format
hadoop jar /path/to/jar/hadoop-lzo-cdh4-0.4.15-gplextras.jar com.hadoop.compression.lzo.LzoIndexer /path/to/HDFS/dir/containing/lzo/files

...

The following hive -e command creates an LZO-compressed external table:

No Format


hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1  datatype_1......column_N datatype_N) 
         PARTITIONED BY (partition_col_1 datatype_1 ....col_P  datatype_P) 
         ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
         STORED AS INPUTFORMAT  \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"   
                   OUTPUTFORMAT \"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\ ";"

Note: The double quotes have to be escaped so that the 'hive -e' command works correctly.

See CREATE TABLE and Hive CLI for information about command syntax.

Hive Queries

Option 1: Directly Create LZO Files

Directly create LZO files as the output of the Hive query.
Use lzop command utility or your custom Java to generate .lzo.index for the .lzo files.

Hive Query Parameters

No Format
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodecLzoCodec SET hive.exec.compress.output=true SET mapreduce.output.fileoutputformat.compress=true

For example:

No Format


hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodecLzoCodec; SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true; <query-string>"

Note: If the data sets are large or number of output files are large , then this option does not work.

Option 2: Write Custom Java to Create LZO Files

Create text files as the output of the Hive query.
Write custom Java code to
1. convert Hive query generated text files to .lzo files
2. generate .lzo.index files for the .lzo files generated above

...

Prefix the query string with these parameters:

No Format
SET hive.exec.compress.output=false SET mapreduce.output.fileoutputformat.compress=false

For example:

No Format
hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version Current

Key

LZO Compression

General LZO Concepts

Hive Queries

Option 1: Directly Create LZO Files

Option 2: Write Custom Java to Create LZO Files

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 1

New Version Current

Key

LZO Compression

General LZO Concepts

Hive Queries

Option 1: Directly Create LZO Files

Option 2: Write Custom Java to Create LZO Files