Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added section on Avro Data Stored in HBase Columns

...

"hbase.composite.key.factory" should be the fully qualified class name of a class implementing HBaseKeyFactory. See SampleHBaseKeyFactory2 for a fixed length example in the same package. This class must be on your classpath in order for the above example to work. TODO: place these in an accessible place; they're currently only in test code.

Avro Data Stored in HBase Columns

Info

As of Hive 0.14.0 with HIVE-6147

Hive 0.14.0 onward supports storing and querying Avro objects in HBase columns by making them visible as structs to Hive. This allows Hive to perform ad hoc analysis of HBase data which can be deeply structured. Prior to 0.14.0, the HBase Hive integration only supported querying primitive data types in columns.

An example HiveQL statement where test_col_fam is the column family and test_col is the column name:

Code Block
CREATE EXTERNAL TABLE test_hbase_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (
	"hbase.columns.mapping" = ":key,test_col_fam:test_col", 
	"test_col_fam.test_col.serialization.type" = "avro",
	"test_col_fam.test_col.avro.schema.url" = "hdfs://testcluster/tmp/schema.avsc")
TBLPROPERTIES ("hbase.table.name" = "hbase_avro_table", "hbase.struct.autogenerate"="true");

The important properties to note are the following three:

Code Block
"test_col_fam.test_col.serialization.type" = "avro"

This property tells Hive that the given column under the given column family is an Avro column, so Hive needs to deserialize it accordingly.

Code Block
"test_col_fam.test_col.avro.schema.url" = "hdfs://testcluster/tmp/schema.avsc"

Using this property you specify where the reader schema is for the column that will be used to deserialize. This can be on HDFS like mentioned here, or provided inline using something like "test_col_fam.test_col.avro.schema.literal" property. If you have a custom store where you store this schema, you can write a custom implementation of AvroSchemaRetriever and plug that in using the "avro.schema.retriever property" using a property like "test_col_fam.test_col.avro.schema.retriever". You would need to ensure that the jar with this custom class is on the Hive classpath. For a usage discussion and links to other resources, see HIVE-6147.

Code Block
"hbase.struct.autogenerate" = "true"

Specifying this property lets Hive auto-deduce the columns and types using the schema that was provided. This allows you to avoid manually creating the columns and types for Avro schemas, which can be complicated and deeply nested.

Put Timestamps

Info
titleVersion information

As of Hive 0.9.0

...

Jira Issues
urlhttps://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?jqlQuery=project+%3D+HIVE+AND+component+in+%28%22HBase+Handler%22%29+and+Resolution+%3D+unresolved&tempMax=1000