Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: revise parameter information, add links

...

A new lock manager has also been added to Hive, the DbLockManager.  This lock manager stores all lock information in the metastore.  In addition all transactions are stored in the metastore.  This means that transactions and locks are durable in the face of server failure.  To avoid clients dying and leaving transaction or locks dangling, a heartbeat is sent from lock holders and transaction initiators to the metastore on a regular basis.  If a heartbeat is not received in the configured amount of time, the lock or transaction will be aborted.

Configuration

These configuration parameters must be set appropriately to turn on transaction support in Hive:

The following sections list all of the configuration parameters that affect Hive transactions and compaction.

New Configuration Parameters for Transactions

A number of new configuration parameters have been added to the system to support transactions.

support.concurrency

Configuration key

Values

Notes

hive.

txn.manager 

Default: false

Value to turn on transactions: true

 

hive.txn.manager 

Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

Value to turn on required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.

hive.txn.timeout timeout 

Default: 300

Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds.

hive.txn.max.open.batch

Default: 1000

Maximum number of transactions that can be fetched in one call to open_txns().*

hive.compactor.initiator.on

Default: false

Value to turn on required for transactions: true (for exactly one instance of the Thrift metastore service)

Whether to run the initiator and cleaner threads on this metastore instance.

 

hive.compactor.worker.threads

Default: 0

Value to turn on required for transactions: > 0 on at least one instance of the Thrift metastore service

How many compactor worker threads to run on this metastore instance.**

hive.compactor.worker.timeout

Default: 86400

Time in seconds after which a compaction job will be declared failed and the compaction re-queued.

hive.compactor.check.interval

Default: 300

Time in seconds between checks to see if any tables or partitions need to be compacted.***

hive.compactor.delta.num.threshold

Default: 10

Number of delta directories in a table or partition that will trigger a minor compaction.

hive.compactor.delta.pct.threshold

Default: 0.1

Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.

hive.compactor.abortedtxn.threshold

Default: 1000

Number of aborted transactions involving a given table or partition that will trigger a major compaction.

...

In addition to the new parameters listed above, some existing parameters need to be set to support INSERT ... VALUES, UPDATE, and DELETE.

Configuration keyMust be set to
hive.support.concurrencytrue (default is false)
hive.enforce.bucketingtrue (default is false)
hive.exec.dynamic.partition.modenonstrict (default is strict)

Configuration Values to Set for Compaction

...