...

hive.txn.xlock.write

Configuration key	Values	Location	Notes
hive.txn.manager	Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager	Client/ HiveServer2	DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.
hive.txn.strict.locking.mode	Default: true	Client/ HiveServer2	In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of Hive 2.2.0).
hive.txn.timeout deprecated. Use metastore.txn.timeout instead	Default: 300	Client/ HiveServer2/ Metastore	Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.⁵
hive.txn.heartbeat.threadpool.size deprecated - but still in use	Default: 5	Client/ HiveServer2	The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0).
hive.timedout.txn.reaper.start deprecated	Default: 100s	Metastore	Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.
hive.timedout.txn.reaper.interval deprecated	Default: 180s	Metastore	Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.
hive.txn.max.open.batch deprecated. Use metastore.txn.max.open.batch instead	Default: 1000	Client	Maximum number of transactions that can be fetched in one call to open_txns().¹
hive.max.open.txns deprecated. Use metastore.max.open.txns instead.	Default: 100000	HiveServer2/ Metastore	Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of Hive 1.3.0 and 2.1.0.)
hive.count.open.txns.interval deprecated. Use metastore.count.open.txns.interval instead.	Default: 1s	HiveServer2/ Metastore	Time in seconds between checks to count open transactions (as of Hive 1.3.0 and 2.1.0).
hive.txn.retryable.sqlex.regex deprecated. Use metastore.txn.retryable.sqlex.regex instead.	Default: "" (empty string)	HiveServer2/ Metastore	Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of Hive 1.3.0 and 2.1.0). For an example, see Configuration Properties.
hive.compactorcompaction.initiator.on deprecated. Use metastoremerge.enabled	Default: false	HiveServer2	Enables merge-based compaction which is a compaction optimization when few ORC delta files are present
hive.compactor.initiator.duration.on insteadupdate.interval	Default:false Value required for transactions: true (for exactly one instance of the Thrift metastore service)	Metastore	60s	HiveServer2	Time in seconds that drives the update interval of compaction_initiator_duration metric. Smaller value results in a fine grained metric update. This updater can be turned off if its value less than or equals to zero. In this case the above metric will be update only after the initiator completed one cycle. The hive.compactor.initiator.on must be turned on (true) in-order to enable the Initiator, otherwise this setting has no effect Whether to run the initiator thread on this metastore instance. Prior to Hive 1.3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.
hive.compactor.cleanerinitiator.on deprecated. Use metastore.compactor.cleanerinitiator.on instead.	Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service)	Metastore	Whether to run the cleaner initiator thread on this metastore instance. Before Prior to Hive 4 1. 0.0 Cleaner thread can be started/stopped with config hive.compactor.initiator.on. This config helps to enable/disable initiator/cleaner threads independently 3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.
hive.compactor.crudcleaner.duration.queryupdate.basedinterval	Default: false60s	HiveServer2	Means compaction on full CRUD tables is done via queries. Compactions on insert-only tables will always run via queries regardless of the value of this configuration.	Time in seconds that drives the update interval of compaction_cleaner_duration metric. Smaller value results in a fine grained metric update. This updater can be turned off if its value less than or equals to zero. In this case the above metric will be update only after the cleaner completed one cycle.
hive.compactor.cleaner.on hive.compactor.worker.threads deprecated. Use metastore.compactor.workercleaner.threads on instead.	Default: 0false Value required for transactions: > 0 on at least true (for exactly one instance of the Thrift metastore service)	Metastore	How many compactor worker threads Whether to run the cleaner thread on this metastore instance.² Before Hive 4.0.0 Cleaner thread can be started/stopped with config hive.compactor. worker.timeout	Default: 86400s	Metastore	Time in seconds after which a compaction job will be declared failed and the compaction re-queued. initiator.on. This config helps to enable/disable initiator/cleaner threads independently
hive.compactor.cleaner.runthreads.intervalnum	Default: 5000ms 1	Metastore	HiveServer2	Enables parallelization of the cleaning directories after compaction, that includes many file related checks and may be expensive Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.)
hive.compactor.checkcompact.insert.only interval	Default: 300strue	Metastore	HiveServer2	Whether the compactor should compact insert-only tables. A safety switch.Time in seconds between checks to see if any tables or partitions need to be compacted.³
hive.compactor.deltacrud.numquery.thresholdbased	Default: 10false	Metastore	Number of delta directories in a table or partition that will trigger a minor compaction.	HiveServer2	Means compaction on full CRUD tables is done via queries. Compactions on insert-only tables will always run via queries regardless of the value of this configuration.	hive.compactor.delta.pct.threshold	Default: 0.1	Metastore	Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.
hive.compactor.abortedtxngather.thresholdstats	Default: 1000	Metastore	Number of aborted transactions involving a given table or partition that will trigger a major compaction.
hive.compactor.aborted.txn.time.threshold	Default: 12h	Metastore	Age of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable.
hive.compactor.max.num.delta	Default: 500	Metastore	Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).⁴
hive.compactor.job.queue	Default: "" (empty string)	Metastore	Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0).
Compaction History
hive.compactor.history.retention.succeeded deprecated. Use metastore.compactor.history.retention.succeeded instead	Default: 3	Metastore	Number of successful compaction entries to retain in history (per partition).
hive.compactor.history.retention.failed deprecated. Use metastore.compactor.history.retention.failed instead.	Default: 3	Metastore	Number of failed compaction entries to retain in history (per partition).
true	HiveServer2	If set to true MAJOR compaction will gather stats if there are stats already associated with the table/partition. Turn this off to save some resources and the stats are not used anyway. This is a replacement for the HIVE_MR_COMPACTOR_GATHER_STATS config, and works both for MR and Query based compaction.
metastore.compactor.initiator.failed.retry.time	Default: 7d	Metastore	Time after Initiator will ignore metastore.compactor.initiator.failed.compacts.threshold and retry with compaction again. This will try to auto heal tables with previous failed compaction without manual intervention. Setting it to 0 or negative value will disable this feature.
metastore.compactor.long.running.initiator.threshold.warning	Default: 6h	Metastore	Initiator cycle duration after which a warning will be logged. Default time unit is: hours
metastore.compactor.long.running.initiator.threshold.error	Default: 12h	Metastore	Initiator cycle duration after which an error will be logged. Default time unit is: hours
hive.compactor.worker.sleep.time	Default: 10800ms	HiveServer2	Time in milliseconds for which a worker threads goes into sleep before starting another iteration in case of no launched job or error
hive.compactor.worker.max.sleep.time	Default: 320000ms	HiveServer2	Max time in milliseconds for which a worker threads goes into sleep before starting another iteration used for backoff in case of no launched job or error
hive.compactor.worker.threads hive.compactor.history.retention.attempted deprecated. Use metastore.compactor.history.retention.did.not.initiate worker.threads instead.	Default: 2	Metastore	0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service	Metastore	How many compactor worker threads to run on this metastore instance.² Number of attempted compaction entries to retain in history (per partition).
hive.compactor. initiator.failed.compacts.threshold deprecated. Use metastore.compactor.initiator.failed.compacts.threshold instead.	Default: 2	Metastore	Number of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed.
hive.compactor.history.reaper.interval deprecated. metastore.acid.housekeeper.interval handles it.	Default: 2m	Metastore	Controls how often the process to purge historical record of compactions runs.
hive.txn.xlock.iow	Default: true	HiveServer2	Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks for transactional tables. This ensures that inserts (w/o overwrite) running concurrently are not hidden by the INSERT OVERWRITE.
worker.timeout	Default: 86400s	Metastore	Time in seconds after which a compaction job will be declared failed and the compaction re-queued.
hive.compactor.cleaner.run.interval	Default: 5000ms	Metastore	Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.)
hive.compactor.check.interval	Default: 300s	Metastore	Time in seconds between checks to see if any tables or partitions need to be compacted.³
hive.compactor.delta.num.threshold	Default: 10	Metastore	Number of delta directories in a table or partition that will trigger a minor compaction.
hive.compactor.delta.pct.threshold	Default: 0.1	Metastore	Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.
hive.compactor.abortedtxn.threshold	Default: 1000	Metastore	Number of aborted transactions involving a given table or partition that will trigger a major compaction.
hive.compactor.aborted.txn.time.threshold	Default: 12h	Metastore	Age of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable.
hive.compactor.max.num.delta	Default: 500	Metastore	Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).⁴
hive.compactor.job.queue	Default: "" (empty string)	Metastore	Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0).
hive.compactor.request.queue	Default: 1	HiveServer2	Enables parallelization of the checkForCompaction operation, that includes many file metadata checks and may be expensive
hive.split.grouping.mode	Default: query (Allowed values: query, compactor)	HiveServer2	This is set to compactor from within the query based compactor. This enables the Tez SplitGrouper to group splits based on their bucket number, so that all rows from different bucket files for the same bucket number can end up in the same bucket file after the compaction.
hive.txn.xlock.iow	Default: true	HiveServer2	Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks for transactional tables. This ensures that inserts (w/o overwrite) running concurrently are not hidden by the INSERT OVERWRITE.
hive.txn.xlock.write	Default: true	HiveServer2	Manages concurrency levels for ACID resources. Provides better level of query parallelism by enabling shared writes and write-write conflict resolution at the commit step. - If true - exclusive writes are used: - INSERT OVERWRITE acquires EXCLUSIVE locks - UPDATE/DELETE acquire EXCL_WRITE locks - INSERT acquires SHARED_READ locks - If false - shared writes, transaction is aborted in case of conflicting changes: - INSERT OVERWRITE acquires EXCL_WRITE locks - INSERT/UPDATE/DELETE acquire SHARED_READ locks
metastore.acidmetrics.ext.on	Default: true	HiveServer2	Whether to collect additional acid related metrics outside of the acid metrics service. (metastore.metrics.enabled and/or hive.server2.metrics.enabled are also required to be set to true.)
Compaction History
hive.compactor.history.retention.succeeded deprecated. Use metastore.compactor.history.retention.succeeded instead	Default: 3	Metastore	Number of successful compaction entries to retain in history (per partition).
hive.compactor.history.retention.failed deprecated. Use metastore.compactor.history.retention.failed instead.	Default: 3	Metastore	Number of failed compaction entries to retain in history (per partition).
hive.compactor.history.retention.attempted deprecated. Use metastore.compactor.history.retention.did.not.initiate instead.	Default: 2	Metastore	Number of attempted compaction entries to retain in history (per partition).
hive.compactor.initiator.failed.compacts.threshold deprecated. Use metastore.compactor.initiator.failed.compacts.threshold instead.	Default: 2	Metastore	Number of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed.
metastore.compactor.initiator.failed.compacts.threshold	Default: 2 (Allowed between 1 and 20)	Metastore	Number of consecutive compaction failures (per table/partition) after which automatic compactions will not be scheduled any more. Note that this must be less than hive.compactor.history.retention.failed.
hive.compactor.history.reaper.interval deprecated. metastore.acid.housekeeper.interval handles it.	Default: 2m	Metastore	Controls how often the process to purge historical record of compactions runs.
ACID metrics
metastore.acidmetrics.check.interval	Default: 300s	Metastore	Time in seconds between acid related metric collection runs.
metastore.acidmetrics.thread.on	Default: true	Metastore	Whether to run acid related metrics collection on this metastore instance.
metastore.deltametrics.delta.num.threshold	Deafult: 100	Metastore	The minimum number of active delta files a table/partition must have in order to be included in the ACID metrics report.
metastore.deltametrics.delta.pct.threshold	Default: 0.01	Metastore	Percentage (fractional) size of the delta files relative to the base directory. Deltas smaller than this threshold count as small deltas. Default 0.01 = 1%.)
metastore.deltametrics.max.cache.size	Default: 100 (Allowed between 0 and 500)	Metastore	Size of the ACID metrics cache, i.e. max number of partitions and unpartitioned tables with the most deltas that will be included in the lists of active, obsolete and small deltas. Allowed range is 0 to 500.
metastore.deltametrics.obsolete.delta.num.threshold	Default: 100	Metastore	The minimum number of obsolete delta files a table/partition must have in order to be included in the ACID metrics report.	Default: true	HiveServer2	Manages concurrency levels for ACID resources. Provides better level of query parallelism by enabling shared writes and write-write conflict resolution at the commit step. - If true - exclusive writes are used: - INSERT OVERWRITE acquires EXCLUSIVE locks - UPDATE/DELETE acquire EXCL_WRITE locks - INSERT acquires SHARED_READ locks - If false - shared writes, transaction is aborted in case of conflicting changes: - INSERT OVERWRITE acquires EXCL_WRITE locks - INSERT/UPDATE/DELETE acquire SHARED_READ locks

¹metastore.txn.max.open.batch controls how many transactions streaming agents such as Flume or Storm open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of delta files created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance.

...

Code Block

title	Example: Set compaction options in TBLPROPERTIES at request level

ALTER TABLE table_name COMPACT 'minor' 
   WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072");  -- specify compaction map job properties
ALTER TABLE table_name COMPACT 'major'
   WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192");         -- change any other Hive table properties

Talks and Presentations

The Art of Compaction by Kokila N at a Cloudera meetup.

Transactional Operations In Hive by Eugene Koifman at Dataworks Summit 2017, San Jose, CA, USA

...

Space shortcuts

Child pages

Versions Compared

Old Version 14

New Version Current

Key

Talks and Presentations

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 14

New Version Current

Key

Talks and Presentations