Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add transaction & compactor parameters (HIVE-6541)

...

The number of attempts waiting for localizing a resource in Hive-Tez.

Transactions and Compactor

Hive transactions with row-level ACID functionality were added in Hive 0.13.0 (HIVE-5317 and its subtasks). For details see ACID and Transactions in Hive.

To turn on Hive transactions, change the values of these parameters from their defaults, as described below:

hive.txn.manager
  • Default Value: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager
  • Hive Transactions Value: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  • Added In: Hive 0.13.0 with HIVE-5843

To turn on Hive transactions, set this to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager. The default DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.

hive.txn.timeout
  • Default Value: 300
  • Added In: Hive 0.13.0 with HIVE-5843

Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds.

hive.txn.max.open.batch
  • Default Value: 1000
  • Added In: Hive 0.13.0 with HIVE-5843

Maximum number of transactions that can be fetched in one call to open_txns().

This controls how many transactions streaming agents such as Flume or Storm open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of delta files created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance.

hive.compactor.initiator.on
  • Default Value: false
  • Hive Transactions Value: true (for exactly one instance of the Thrift metastore service)
  • Added In: Hive 0.13.0 with HIVE-5843

Whether to run the initiator and cleaner threads on this metastore instance. Set this to true on one instance of the Thrift metastore service to turn on Hive transactions.

hive.compactor.worker.threads
  • Default Value: 0
  • Hive Transactions Value: greater than 0 on at least one instance of the Thrift metastore service
  • Added In: Hive 0.13.0 with HIVE-5843

How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service to turn on Hive transactions.

Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.

hive.compactor.worker.timeout
  • Default Value: 86400
  • Added In: Hive 0.13.0 with HIVE-5843

Time in seconds after which a compaction job will be declared failed and the compaction re-queued.

hive.compactor.check.interval
  • Default Value: 300
  • Added In: Hive 0.13.0 with HIVE-5843

Time in seconds between checks to see if any tables or partitions need to be compacted. This should be kept high because each check for compaction requires many calls against the NameNode.

Decreasing this value will reduce the time it takes for compaction to be started for a table or partition that requires compaction.  However, checking if compaction is needed requires several calls to the NameNode for each table or partition that has had a transaction done on it since the last major compaction.  So decreasing this value will increase the load on the NameNode.

hive.compactor.delta.num.threshold
  • Default Value: 10
  • Added In: Hive 0.13.0 with HIVE-5843

Number of delta directories in a table or partition that will trigger a minor compaction.

hive.compactor.delta.pct.threshold
  • Default Value: 0.1
  • Added In: Hive 0.13.0 with HIVE-5843

Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)

hive.compactor.abortedtxn.threshold
  • Default Value: 1000
  • Added In: Hive 0.13.0 with HIVE-5843

Number of aborted transactions involving a given table or partition that will trigger a major compaction.

Indexing

Indexing was added in Hive 0.7.0 with HIVE-417, and bitmap indexing was added in Hive 0.8.0 with HIVE-1803.  For For more information see Indexing.

...