...
Configuration key | Values | Location | Notes |
Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/ | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | |
Default: 300 | Client/ Metastore | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5 | |
hive.txn.heartbeat.threadpool.size | Default: 5 | Client/ HiveServer2 | The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0). |
hive.timedout.txn.reaper.start | Default: 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). |
Default: 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). | |
Default: 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().1 | |
hive.max.open.txns | Default: 100000 | HiveServer2/ Metastore | Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of Hive 1.3.0 and 2.1.0.) |
hive.count.open.txns.interval | Default: 1s | HiveServer2/ Metastore | Time in seconds between checks to count open transactions (as of Hive 1.3.0 and 2.1.0). |
hive.txn.retryable.sqlex.regex | Default: "" (empty string) | HiveServer2/ Metastore | Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of Hive 1.3.0 and 2.1.0). For an example, see Configuration Properties. |
Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator and cleaner threads on this metastore instance.ItPrior to Hive 1.3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.
| |
Default: 0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | |
Default: 86400 | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | |
hive.compactor.cleaner.run.interval | Default: 5000 | Metastore | Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.) |
Default: 300 | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | |
Default: 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | |
Default: 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | |
Default: 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | |
Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4 | |
Default: "" (empty string) | Metastore | Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0). |
...
2Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background. Each compaction can handle one partition at a time (or whole table if it's unpartitioned).
3Decreasing this value will reduce the time it takes for compaction to be started for a table or partition that requires compaction. However, checking if compaction is needed requires several calls to the NameNode for each table or partition that has had a transaction done on it since the last major compaction. So decreasing this value will increase the load on the NameNode.
...
If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional=true
" must be set on that table, starting with Hive 0.14.0. Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, hive.txn.manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited prior to HIVE-11716. Since HIVE-11716 operations on ACID tables without DbTxnManager are not allowed. However, this does not apply to Hive 0.13.0.
...
Table properties are set with the TBLPROPERTIES clause when a table is created or altered, as described in the Create Table and Alter Table Properties sections of Hive Data Definition Language. The "transactional
" and "NO_AUTO_COMPACTION
" table properties are case-sensitive in Hive releases 0.x and 1.0, but they are case-insensitive starting with release 1.1.0 (HIVE-8308).
More compaction related options can be set via TBLPROPERTIES as of Hive 1.3.0 and 2.1.0. They can be set at both table-level via CREATE TABLE, and on request-level via ALTER TABLE/PARTITION COMPACT. These are used to override the Warehouse/table wide settings. For example, to override an MR property to affect a compaction job, one can add "compactor.<mr property name>=<value>" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE. The "<mr property name>=<value>" will be set on JobConf of the compaction MR job. Similarly, "tblprops.<prop name>=<value>" can be used to set/override any table property which is interpreted by the code running on the cluster. Finally, "compactorthreshold.<prop name>=<value>" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system. Examples:
Code Block | ||
---|---|---|
| ||
CREATE TABLE table_name ( id int, name string ) CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC TBLPROPERTIES ("transactional"="true", "compactor.mapreduce.map.memory.mb"="2048", -- specify compaction map job properties "compactorthreshold.hive.compactor.delta.num.threshold"="4", -- trigger minor compaction if there are more than 4 delta directories "compactorthreshold.hive.compactor.delta.pct.threshold"="0.5" -- trigger major compaction if the ratio of size of delta files to -- size of base files is greater than 50% ); |
...