Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: reorganize configuration table to make it narrower, move footnotes into the table

...

A number of new configuration values have been added to the system to support transactions.

Configuration key

Default

Value to turn on transactionsValues

Notes

hive.txn.manager 

Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager

Value to turn on transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.

hive.txn.timeout 

Default: 300

 

Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds.

hive.txn.max.open.batch

Default: 1000

 

Maximum number of transactions that can be fetched in one call to open_txns().

This controls how many transactions streaming agents such as Flume or Storm open simultaneously.  The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt).* .  Thus increasing this value decreases the number of files created by streaming agents.  But it also increases the number of open transactions that Hive has to track, which may negatively affect read performance.

hive.compactor.initiator.on

Default: false

Value to turn on transactions: true (for exactly one instance of the Thrift metastore service)

Whether to run the initiator and cleaner threads on this metastore instance.

 

hive.compactor.worker.threads

Default: 0

Value to turn on transactions: > 0 on at least one instance of the Thrift metastore service

How many worker threads to run on this metastore instance.**.

Worker threads spawn MapReduce jobs to do compactions.  They do not do the compactions themselves.  Increasing the number of worker threads will decrease the time it takes tables to be compacted once they are determined to need compaction.  It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.

hive.compactor.worker.timeout

Default: 86400

 

Time in seconds after which a compaction job will be declared failed and the compaction re-queued.

hive.compactor.check.interval

Default: 300

 

Time in seconds between checks to see if any partitions need to be compacted.***.

Decreasing this value will reduce the time it takes for compaction to be started for a table or partition that requires compaction.  However, checking if compaction is needed requires several calls to the NameNode for each table or partition that has had a transaction done on it since the last major compaction.  So decreasing this value will increase the load on the NameNode.  

hive.compactor.delta.num.threshold

Default: 10

 

Number of delta directories in a partition that will trigger a minor compaction.

hive.compactor.delta.pct.threshold

Default: 0.1

 

Fractional size of the deltas relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.

hive.compactor.abortedtxn.threshold

Default: 1000

 

Number of aborted transactions on a given partition that will trigger a major compaction. 

* hive.txn.max.open.batch controls how many transactions streaming agents such as Flume or Storm open simultaneously.  The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt).  Thus increasing this value decreases the number of files created by streaming agents.  But it also increases the number of open transactions that Hive has to track, which may negatively affect read performance.

** Worker threads spawn MapReduce jobs to do compactions.  They do not do the compactions themselves.  Increasing the number of worker threads will decrease the time it takes tables to be compacted once they are determined to need compaction.  It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.

...

a

...

major compaction. 

...

                                                                                               

...

Table Properties

If a table owner does not wish the system to automatically determine when to compact, then the table property NO_AUTO_COMPACTION can be set.  This will prevent all automatic compactions.  Manual compactions can still be done with Alter Table/Partition Compact statements.

...