...
A new lock manager has also been added to Hive, the DbLockManager. This lock manager stores all lock information in the metastore. In addition all transactions are stored in the metastore. This means that transactions and locks are durable in the face of server failure. To avoid clients dying and leaving transaction or locks dangling, a heartbeat is sent from lock holders and transaction initiators to the metastore on a regular basis. If a heartbeat is not received in the configured amount of time, the lock or transaction will be aborted.
Configuration
These configuration parameters must be set appropriately to turn on transaction support in Hive:
- hive.support.concurrency – true
- hive.enforce.bucketing – true
- hive.exec.dynamic.partition.mode – nonstrict
- hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- hive.compactor.initiator.on – true (for exactly one instance of the Thrift metastore service)
- hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service
The following sections list all of the configuration parameters that affect Hive transactions and compaction.
New Configuration Parameters for Transactions
A number of new configuration parameters have been added to the system to support transactions.
Configuration key | Values | Notes | ||||
support.concurrency | Default: false Value to turn on transactions: true | hive.txn.manager | Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value to turn on required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | ||
Default: 300 | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. | |||||
Default: 1000 | Maximum number of transactions that can be fetched in one call to open_txns().* | |||||
Default: false Value to turn on required for transactions: true (for exactly one instance of the Thrift metastore service) | Whether to run the initiator and cleaner threads on this metastore instance.
| |||||
Default: 0 Value to turn on required for transactions: > 0 on at least one instance of the Thrift metastore service | How many compactor worker threads to run on this metastore instance.** | |||||
Default: 86400 | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | |||||
Default: 300 | Time in seconds between checks to see if any tables or partitions need to be compacted.*** | |||||
Default: 10 | Number of delta directories in a table or partition that will trigger a minor compaction. | |||||
Default: 0.1 | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | |||||
Default: 1000 | Number of aborted transactions involving a given table or partition that will trigger a major compaction. |
...
In addition to the new parameters listed above, some existing parameters need to be set to support INSERT ... VALUES, UPDATE, and DELETE.
Configuration key | Must be set to |
---|---|
hive.support.concurrency | true (default is false) |
hive.enforce.bucketing | true (default is false) |
hive.exec.dynamic.partition.mode | nonstrict (default is strict) |
Configuration Values to Set for Compaction
...