Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

No Format
CREATE TABLE accumulo_table(row STRING, name STRING, age INT, weight DOUBLE, height INT)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES('accumulo.columns.mapping' = ':rowid,person:name,person:age,person:weight,person:height');

In the above statement, normal Hive column name and type pairs are provided as is the case with normal create table statements. The full AccumuloStorageHandler class name is provided to inform Hive that Accumulo will back this Hive table. A number of properties can be provided to configure the AccumuloStorageHandler via SERDEPROPERTIES or TABLEPROPERTIES. The most important property is "accumulo.columns.mapping" which controls how the Hive columns map to Accumulo columns.

Column Mapping

The column mapping string is comma-separated list of encoded values whose offset corresponds to the Hive schema for the table. For those familiar with Accumulo, each element in the column mapping string resembles a column_family:column_qualifier; however, there are a few different variants that allow for different control.

  1. A single column
    1. This places the value for the Hive column into the Accumulo value with the given column family and column qualifier.
  2. A column qualifier map
    1. A column family is provided and a column qualifier prefix of any length is allowed, follow by an asterisk.
    2. The Hive column type is expected to be a Map, the key of the Hive map is appended to the column qualifier prefix
    3. The value of the Hive map is placed in the Accumulo value.
  3. The rowid
    1. Controls which Hive column is used as the Accumulo rowid.
    2. Exactly one ":rowid" element must exist in each column mapping.

Additionally, a serialization option can be provided to each element in the column mapping which will control how the value is serialized. Currently, the options are:

  • 'binary' or 'b'
  • 'string' or 's'

These are set by including a pound sign ('#') after the column mapping element with either the long or short serialization value. The default serialization is 'string'. For example, for the value 10, "person:age#s" is synonymous with the "person:age" and would serialize the value as the literal string "10". If "person:age#b" was used instead, the value would be serialized as four bytes: \x00\x00\x00\xA0.