...
Configuration key | Values | Location | Notes |
Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/ | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | |
Default: 300 | Client/ HiveServer2 | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. | |
hive.timedout.txn.reaper.start | Default: 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). |
Default: 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). | |
Default: 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().*1 | |
Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator and cleaner threads on this metastore instance. It's critical that this is enabled on exactly one metastore service instance (not enforced yet).
| |
Default: 0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.**2 | |
Default: 86400 | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | |
hive.compactor.cleaner.run.interval | Default: 5000 | Metastore | Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.) |
Default: 300 | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.***3 | |
Default: 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | |
Default: 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | |
Default: 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | |
Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4 | |
Default: "" (empty string) | Metastore | Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0). |
*1hive.txn.max.open.batch controls how many transactions streaming agents such as Flume or Storm open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of delta files created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance.
**2Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.
***3Decreasing this value will reduce the time it takes for compaction to be started for a table or partition that requires compaction. However, checking if compaction is needed requires several calls to the NameNode for each table or partition that has had a transaction done on it since the last major compaction. So decreasing this value will increase the load on the NameNode.
4If the compactor detects a very high number of delta files, it will first run several partial minor compactions (currently sequentially) and then perform the compaction actually requested.
Configuration Values to Set for INSERT, UPDATE, DELETE
...