Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • for each Hive column, the table creator must specify a corresponding entry in the comma-delimited hbase.columns.mapping string (so for a Hive table with n columns, the string should have n entries); whitespace should not be used in between entries since these will be interperted as part of the column name, which is almost certainly not what you want
  • a mapping entry must be either :key or of the form column-family-name:[column-name][#(binary|string) (the type specification that delimited by # was added in Hive 0.9.0, earlier versions interpreted everything as strings)
    • If no type specification is given the value from hbase.table.default.storage.type will be used
    • Any prefixes of the valid values are valid too (i.e. #b instead of #binary)
    • If you specify a column as binary the bytes in the corresponding HBase cells are expected to be of the form that HBase's Bytes class yields.
  • there must be exactly one :key mapping (this can be mapped either to a string or struct column–see Simple Composite Keys and Complex Composite Keys )
  • (note that before HIVE-1228 in Hive 0.6, :key was not supported, and the first Hive column implicitly mapped to the key; as of Hive 0.6, it is now strongly recommended that you always specify the key explictly; we will drop support for implicit key mapping in the future)
  • if no column-name is given, then the Hive column will map to all columns in the corresponding HBase column family, and the Hive MAP datatype must be used to allow access to these (possibly sparse) columns
  • there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.
  • Since HBase does not associate datatype information with columns, the serde converts everything to string representation before storing it in HBase; there is currently no way to plug in a custom serde per column
  • it is not necessary to reference every HBase column family, but those that are not mapped will be inaccessible via the Hive table; it's possible to map multiple Hive tables to the same HBase table

...

No Format
CREATE TABLE hbase_table_1 (key int, value string, foobar double)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:val#s,cf:foo",
"hbase.table.default.storage.type" = "binary"
);

Simple Composite Row

...

Keys 
Anchor
simple_composite_keys
simple_composite_keys

Info

As of Hive 0.13.0

Hive can read and write delimited composite keys to HBase by mapping the HBase row key to a hive struct, and using the ROW FORMAT DELIMITED...COLLECTION ITEMS TERMINATED BY. Example:

Code Block
-- Create a table with a composite row key consisting of two string fields, delimited by '~'
CREATE EXTERNAL TABLE delimited_example(key struct<f1:string, f2:string>, value string) 
ROW FORMAT DELIMITED 
COLLECTION ITEMS TERMINATED BY '~' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
  'hbase.columns.mapping'=':key,f:c1');

Complex Composite Row Keys and

...

HBaseKeyFactory 
Anchor
complex_composite_keys
complex_composite_keys

Info

As of Hive 0.14.0 (0.13.0 also supports complex composite keys, but using a different interface–see

Jira
serverASF JIRA
keyHIVE-2599
for that interface)

...