Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

STREAMING REQUIREMENTS: 

A couple few of things are currently required to use streaming.1)

  1. Only ORC storage format is supported currently. So “stored as orc” must be specified during table creation.

...

  1. The Hive table must be bucketed, but not sorted. So something like “clustered by (colName) into 10 buckets” must be specified during table creation. The number of buckets is ideally the same as the number of streaming writers.
  2. User of the client streaming process must have the necessary permissions to write to the table or partition and create partitions in the table.
  3.  When issuing MapReduce queries on streaming tables, the user must set hive.input.format to org.apache.hadoop.hive.ql.io.HiveInputFormat
  4. Settings required in hive-site.xml:
    1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
    2. hive.compactor.initiator.on = true
    3. hive.compactor.worker.threads > 0 

Note: Streaming to unpartitioned tables is also supported.

Usage

Transaction and Connection Management

...