Page History

...

MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (sorry, quote is not supported yet).
LazySimpleSerDe: This SerDe can be used to read the same data format as MetadataTypedColumnsetSerDe and TCTLSeparatedProtocol, however, it creates Objects in a lazy way which provides better performance. Starting in Hive 0.14.0 it also supports read/write data with a specified encode charset, for example:
Code Block
ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK');
LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property hive.lazysimple.extended_boolean_literal is set to true (Hive 0.14.0 and later). The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.
ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first.
DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records).

Also:

For JSON files, an JsonSerDe was added in Hive 0.12.0. An Amazon SerDe is available at s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar for releases prior to 0.12.0.
An Avro SerDe was added in Hive 0.9.1. Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause.
A SerDe for the ORC file format was added in Hive 0.11.0.
A SerDe for Parquet was added via plug-in in Hive 0.10 and natively in Hive 0.13.0.
A SerDe for CSV was added in Hive 0.14.

...

Syntax	Equivalent
STORED AS AVRO / STORED AS AVROFILE	`ROW FORMAT SERDE` `'org.apache.hadoop.hive.serde2.avro.AvroSerDe'` `STORED AS INPUTFORMAT` `'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'` `OUTPUTFORMAT` `'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'`
STORED AS ORC / STORED AS ORCFILE	`ROW FORMAT SERDE` `'org.apache.hadoop.hive.ql.io.orc.OrcSerde'` `STORED AS INPUTFORMAT` `'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'` `OUTPUTFORMAT` `'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'`
STORED AS PARQUET / STORED AS PARQUETFILE	`ROW FORMAT SERDE` `'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'` `STORED AS INPUTFORMAT` `'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'` `OUTPUTFORMAT` `'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'`
STORED AS RCFILE	`STORED AS INPUTFORMAT` `'org.apache.hadoop.hive.ql.io.RCFileInputFormat'` `OUTPUTFORMAT` `OUTPUTFORMAT` `'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'`
STORED AS SEQUENCEFILE	`STORED AS INPUTFORMAT` `'org.apache.hadoop.mapred.SequenceFileInputFormat'` `OUTPUTFORMAT` `'org.apache.hadoop.mapred.SequenceFileOutputFormat'`
STORED AS TEXTFILE	`STORED AS INPUTFORMAT` `'org.apache.hadoop.mapred.TextInputFormat'` `OUTPUTFORMAT` `OUTPUTFORMAT` `'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'`

To add a new native SerDe with STORED AS keyword, follow these steps:

...

Test Queries:
- queries/clientnegative - This directory contains the query files (.q files) for the negative test cases. These are run through the CLI classes and therefore test the entire query processor stack.
- queries/clientpositive - This directory contains the query files (.q files) for the positive test cases. Thesre are run through the CLI classes and therefore test the entire query processor stack.
- qureies/positive (Will be deprecated) - This directory contains the query files (.q files) for the positive test cases for the compiler. These only test the compiler and do not run the execution code.
- queries/negative (Will be deprecated) - This directory contains the query files (.q files) for the negative test cases for the compiler. These only test the compiler and do not run the execution code.
Test Results:
- results/clientnegative - The expected results from the queries in queries/clientnegative.
- results/clientpositive - The expected results from the queries in queries/clientpositive.
- results/compiler/errors - The expected results from the queries in queries/negative.
- results/compiler/parse - The expected Abstract Syntax Tree output for the queries in queries/positive.
- results/compiler/plan - The expected query plans for the queries in queries/positive.
Velocity Templates to Generate the Tests:
- templates/TestCliDriver.vm - Generates the tests from queries/clientpositive.
- templates/TestNegativeCliDriver.vm - Generates the tests from queries/clientnegative.
- templates/TestParse.vm - Generates the tests from queries/positive.
- templates/TestParseNegative.vm - Generates the tests from queries/negative.

Tables in the unit tests

Running unit tests

Note

title	Ant to Maven

As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.

See the Hive Developer FAQ for updated instructions.

...

Apparently the Hive tests do not run successfully after a clean unless you run ant package first. Not sure why build.xml doesn't encode this dependency.

Adding new unit tests

Note

title	Ant to Maven

As of version 0.13 Hive uses Maven instead of Ant for its build. The following instructions are not up to date.

See the Hive Developer FAQ for updated instructions. See also Tips for Adding New Tests in Hive and How to Contribute: Add a Unit Test.

First, write a new myname.q in ql/src/test/queries/clientpositive.

...

Similarly, to add negative client tests, write a new query input file in ql/src/test/queries/clientnegative and run the same command, this time specifying the testcase name as TestNegativeCliDriver instead of TestCliDriver. Note that for negative client tests, the output file if created using the overwrite flag can be be found in the directory ql/src/test/results/clientnegative.See also Tips for Adding New Tests in Hive.

Debugging Hive Code

Anchor

	DebuggingHiveCode
	DebuggingHiveCode

Hive code includes both client-side code (e.g., compiler, semantic analyzer, and optimizer of HiveQL) and server-side code (e.g., operator/task/SerDe implementations). Debugging is different for client-side and server-side code, as described below.

...

Please refer to Hive User Group Meeting August 2009 Page 74-87.

Space shortcuts

Child pages

Versions Compared

Old Version 42

New Version Current

Key

Tables in the unit tests

Running unit tests

Adding new unit tests

Debugging Hive Code