...
This module is responsible for discovering which tables or partitions are due for compaction. This should be enabled in a Metastore using hive.compactor.initiator.on. There are several properties of the form *.threshold in "New Configuration Parameters for Transactions" table below that control when a compaction task is created and which type of compaction is performed. Each compaction task handles 1 partition (or whole table if the table is unpartitioned). If the number of consecutive compaction failures for a given partition exceeds hive.compactor.initiator.failed.compacts.threshold, automatic compaction scheduing will stop for this partition. See Configuration Parameters table for more info.
Worker
Each Worker handles a single compaction task. A compaction is a MapReduce job with name in the following form: <hostname>-compactor-<db>.<table>.<partition>. Each worker submits the job to the cluster (via hive.compactor.job.queue if defined) and waits for the job to finish. hive.compactor.worker.threads determines the number of Workers in each Metastore. The total number of Workrs in the Hive Warehouse determines the maximum number of concurrent compactions.
...
This process looks for transactions that have not heartbeated in hive.txn.timeout time and aborts them. The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released.
SHOW COMPACTIONS
This commands displays information about currently running compaction and recent history (configurable retention period) of compactions. This history display is available since HIV-12353.
Lock Manager
A new lock manager has also been added to Hive, the DbLockManager. This lock manager stores all lock information in the metastore. In addition all transactions are stored in the metastore. This means that transactions and locks are durable in the face of server failure. To avoid clients dying and leaving transaction or locks dangling, a heartbeat is sent from lock holders and transaction initiators to the metastore on a regular basis. If a heartbeat is not received in the configured amount of time, the lock or transaction will be aborted.
...
Configuration key | Values | Location | Notes |
Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/ | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | |
Default: 300 | Client/ Metastore | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5 | |
hive.txn.heartbeat.threadpool.size | Default: 5 | Client/ HiveServer2 | The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0). |
hive.timedout.txn.reaper.start | Default: 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above. |
Default: 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above. | |
Default: 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().1 | |
hive.max.open.txns | Default: 100000 | HiveServer2/ Metastore | Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of Hive 1.3.0 and 2.1.0.) |
hive.count.open.txns.interval | Default: 1s | HiveServer2/ Metastore | Time in seconds between checks to count open transactions (as of Hive 1.3.0 and 2.1.0). |
hive.txn.retryable.sqlex.regex | Default: "" (empty string) | HiveServer2/ Metastore | Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of Hive 1.3.0 and 2.1.0). For an example, see Configuration Properties. |
Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator and cleaner threads on this metastore instance. Prior to Hive 1.3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.
| |
Default: 0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | |
Default: 86400 | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | |
hive.compactor.cleaner.run.interval | Default: 5000 | Metastore | Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.) |
Default: 300 | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | |
Default: 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | |
Default: 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | |
Default: 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | |
Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4 | |
Default: "" (empty string) | Metastore | Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0). | |
Compaction History | |||
hive.compactor.history.retention.succeeded | Default: 3 | Metastore | Number of successful compaction entries to retain in history (per partition). |
hive.compactor.history.retention.failed | Default: 3 | Metastore | Number of failed compaction entries to retain in history (per partition). |
hive.compactor.history.retention.attempted | Default: 2 | Metastore | Number of attempted compaction entries to retain in history (per partition). |
hive.compactor.initiator.failed.compacts.threshold | Default: 2 | Metastore | Number of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed. |
hive.compactor.history.reaper.interval | Default: 2m | Metastore | Controls how often the process to purge historical record of compactions runs. |
1hive.txn.max.open.batch controls how many transactions streaming agents such as Flume or Storm open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of delta files created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance.
...