Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleVersion information

As of Hive 0.14.0 (HIVE-7211), a configuration name that starts with "hive." is regarded as a Hive system property. With the 82903061 hive.conf.validation option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception.

...

See Hive on Tez and Hive on Spark for more information, and see the Tez section and the Spark section below for their configuration properties.

...

Maximum number of reducers that will be used. If the one specified in the configuration property 82903061 mapred.reduce.tasks is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

...

The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed, or updated) by executing the Beeline reload command without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path for creating UDFs or SerDes.

...

Hive 0.14.0 and later:  HDFS root scratch directory for Hive jobs, which gets created with write all (733) permissionFor each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/<username> is created with ${82903061hive.scratch.dir.permission}.

Also see hive.start.cleanup.scratchdir and 82903061 hive.scratchdir.lock.  When running Hive in local mode, see hive.exec.local.scratchdir.

hive.scratch.dir.permission

...

The permission for the user-specific scratch directories that get created in the root scratch directory. (See hive.exec.scratchdir.)

hive.exec.local.scratchdir

...

Scratch space for Hive jobs when Hive runs in local mode.  Also see hive.exec.scratchdir.

hive.hadoop.supports.splittable.combineinputformat

...

Whether to optimize multi group by query to generate a single M/R job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job. (This configuration property was removed in release 0.9.0.)

...

Whether to enable automatic use of indexes.

Note:  See 82903061 Indexing for more configuration properties related to Hive indexes.

...

Whether to enable predicate pushdown (PPD). 

Note: Turn on 82903061 hive.optimize.index.filter as well to use file format specific indexes with PPD.

...

How many values in each key in the map-joined table should be cached in memory.

...

How many rows with the same key value should be cached in memory per sort-merge-bucket joined table.

...

Whether Hive should use a memory-optimized hash table for MapJoin. Only works on 82903061 Tez and 82903061 Spark, because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with HIVE-11180.)

...

  • Default Value: 10485760 (10 * 1024 * 1024)
  • Added In: Hive 0.14.0 with HIVE-6430 

Optimized hashtable (see 82903061 hive.mapjoin.optimized.hashtable) uses a chain of buffers to store data. This is one buffer size. Hashtable may be slightly faster if this is larger, but for small joins unnecessary memory will be allocated and then trimmed.

...

Initial capacity of mapjoin hashtable if statistics are absent, or if 82903061 hive.hashtable.key.count.adjustment is set to 0.

hive.hashtable.key.count.adjustment

...

Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate of the number of keys is divided by this value. If the value is 0, statistics are not used and 82903061 hive.hashtable.initialCapacity is used instead.

hive.hashtable.loadfactor

...

Whether to enable skew join optimization.  (Also see 82903061 hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key
  • Default Value: 100000
  • Added In: Hive 0.6.0

...

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with 82903061 hive.skewjoin.mapjoin.min.split to perform a fine grained control.

...

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with 82903061 hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

...

The main difference between this paramater and 82903061 hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and 82903061 hive.optimize.skewjoin should be set to true. (Ideally, 82903061 hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and 82903061 hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove

...

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when 82903061 hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of 82903061 or 82903061 hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set 82903061 hive.merge.mapfiles to true and 82903061 hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

...

By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable.

Also see:

...

Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with 82903061 hive.exec.dynamic.partition set to true. (This configuration property was removed in release 0.13.0.)

...

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions in Hive 0.14.0 and 1.x.x. For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.enforce.sorting
  • Default Value: 
    • Hive 0.x: false
    • Hive 1.x: false
    • Hive 2.x: removed, which effectively makes it always true (HIVE-12331)
  • Added In: Hive 0.6.0

...

  • Default Value: true
  • Added In: Hive 0.11.0 with HIVE-4240

If 82903061 or 82903061 hive.enforce.bucketing or hive.enforce.sorting is true, don't create a reducer for enforcing bucketing/sorting for queries of the form:

...

where T1 and T2 are bucketed/sorted by the same keys into the same number of buckets. (In Hive 2.0.0 and later, this parameter does not depend on 82903061 or 82903061 hive.enforce.bucketing or hive.enforce.sorting.)

hive.optimize.reducededuplication

...

Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in hive.optimize.limittranspose.reductionpercentage and hive.optimize.limittranspose.reductiontuples), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too.

...

When hive.optimize.limittranspose is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

...

When hive.optimize.limittranspose is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

...

Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.exec.max.dynamic.partitions

...

  • Default Value: 134217728
  • Added In: Hive 0.7.0 with HIVE-1408

When 82903061 hive.exec.mode.local.auto is true, input bytes should be less than this for local mode.

...

  • Default Value: 4
  • Added In: Hive 0.7.0 with HIVE-1408
  • Removed In: Hive 0.9.0 with HIVE-2651

When 82903061 hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode. Replaced in Hive 0.9.0 by 82903061hive.exec.mode.local.auto.input.files.max.

hive.exec.mode.local.auto.input.files.max
  • Default Value: 4
  • Added In: Hive 0.9.0 with HIVE-2651

When 82903061 hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode.

...

To clean up the Hive scratch directory while starting the Hive server (or HiveServer2). This is not an option for a multi-user environment since it will accidentally remove the scratch directory in use.

...

Whether to enable using Column Position Alias in GROUP BY and ORDER BY clauses of queries (deprecated as of Hive 2.2.0; use 82903061 and 82903061 hive.groupby.position.alias and hive.orderby.position.alias instead).

hive.groupby.position.alias

...

Input threshold (in bytes) for applying hive.fetch.task.conversion. If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means hive.fetch.task.conversion is applied without any input length threshold.

...

...

Obsolete:  The dfs.umask value for the Hive-created folders.

...

From Hive 3.1.0 onwards, this configuration property only logs to the log4j INFO. To log the EXPLAIN EXTENDED output in WebUI / Drilldown / Query Plan from Hive 3.1.0 onwards, use 82903061. 

...

hive.server2.webui.explain.output. 

hive.explain.user
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9780

Whether to show explain result at user levelWhen enabled, will log EXPLAIN output for the query at user level. (Tez only.  For Spark, see hive.spark.explain.user.)

hive.typecheck.on.insert

...

  • Default Value: (empty)
  • Added In: Hive 0.8.1

A list of I/O exception handler class names. This is used to construct a list of exception handlers to handle exceptions thrown by record readers.

hive.input.format

The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat.

Also see:

File Formats

hive.default.fileformat

...

Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by 82903061 hive.default.fileformat. Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the StorageHandlers section for more information on managed/external and native/non-native terminology).

...

Besides the configuration properties listed in this section, some properties in other sections are also related to ORC:

hive.exec.orc.memory.pool

...

  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7509

When 82903061, 82903061 or 82903061 hive.merge.mapfiles, hive.merge.mapredfiles or hive.merge.tezfiles is enabled while writing a table with ORC file format, enabling this configuration property will do stripe-level fast merge for small ORC files. Note that enabling this configuration property will not honor the padding tolerance configuration (82903061hive.exec.orc.block.padding.tolerance).

hive.orc.row.index.stride.dictionary.check

...

This flag should be used to provide a comma separated list of fully qualified classnames to exclude certain FileInputFormats from vectorized execution using the vectorized file inputformat. Note that vectorized execution could still occur for that input format based on whether 82903061 or 82903061 hive.vectorized.use.vector.serde.deserialize or hive.vectorized.use.row.serde.deserialize is enabled or not. 

MetaStore

In addition to the Hive metastore properties listed in this section, some properties are listed in other sections:

hive.metastore.local
  • Default Value: true
  • Added In: Hive 0.8.1
  • Removed In: Hive 0.10 with HIVE-2585

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.

In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if 82903061 hive.metastore.schema.verification is true.

datanucleus.schema.autoCreateAll

...

datanucleus.schema.autoCreateAll is disabled if 82903061 hive.metastore.schema.verification is true.

datanucleus.autoStartMechanismMode

...

This parameter does nothing.
Warning note: For most installations, Hive should not enable the DataNucleus L2 cache, since this can cause correctness issues. Thus, some people set this parameter to false assuming that this disables the cache – unfortunately, it does not. To actually disable the cache, set 82903061 datanucleus.cache.level2.type to "none".

datanucleus.cache.level2.type

...

Set this to true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask. (This configuration property replaced 82903061 hive.files.umask.value before Hive 0.9.0 was released) (This configuration property was removed in release 3.0.0, more details in Permission Inheritance in Hive)

...

The client-facing Kerberos service principal for the Hive metastore. If unset, it defaults to the value set for hive.metastore.kerberos.principal, for backward compatibility.

Also see hive.server2.authentication.client.kerberos.principal.

hive.metastore.cache.pinobjtypes

...

Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt (see 82903061 and 82903061 datanucleus.autoCreateSchema and datanucleus.schema.autoCreateAll). Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration.
False: Warn if the version information stored in metastore doesn't match with one from Hive jars.

...

Allow JDO query pushdown for integral partition columns in metastore. Off by default. This improves metastore performance for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (for example, if they have leading zeroes like 0012). If metastore direct SQL is enabled and works (82903061hive.metastore.try.direct.sql), this optimization is also irrelevant.

...

  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-5626

Same as 82903061 hive.metastore.try.direct.sql, for read statements within a transaction that modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL select query has incorrect syntax or something similar inside a transaction, the entire transaction will fail and fall-back to DataNucleus will not be possible. You should disable the usage of direct SQL inside transactions if that happens in your case.

...

This limits the number of partitions that can be requested from the Metastore for a given table. A query will not be executed if it attempts to fetch more partitions per table than the limit configured. A value of "-1" means unlimited. This parameter is preferred over 82903061hive.limit.query.max.table.partition (deprecated; removed in 3.0.0).

...

Besides the configuration properties listed in this section, some HiveServer2 properties are listed in other sections:

hive.server2.thrift.port
  • Default Value: 10000
  • Added In: Hive 0.11.0 with HIVE-2935

...

NONE: no authentication check – plain SASL transport
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider (use with property 82903061 hive.server2.custom.authentication.class)
PAM: Pluggable authentication module (added in Hive 0.13.0 with HIVE-6466)
NOSASL:  Raw transport (added in Hive 0.13.0) 

...

Kerberos server principal used by the HA HiveServer2. Also see hive.metastore.client.kerberos.principal.

hive.server2.custom.authentication.class

...

Custom authentication class. Used when property 82903061hive.server2.authentication is set to 'CUSTOM'. Provided class must be a proper implementation of the interface org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2 will call its Authenticate(user, passed) method to authenticate requests. The implementation may optionally extend Hadoop's org.apache.hadoop.conf.Configured class to grab Hive's Configuration object.

...

List of the underlying PAM services that should be used when 82903061 hive.server2.authentication type is PAM. A file with the same name must exist in /etc/pam.d.

...

A positive integer that determines the number of Tez sessions that should be launched on each of the queues specified by 82903061. Determines hive.server2.tez.default.queues. Determines the parallelism on each queue.

...

  • Default Value:
    • Hive 0.x, 1.0.x, 1.1.x, 1.2.0: 0ms
    • Hive 1.2.1+, 1.3+, 2.x+: 7d (HIVE-9842
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, session will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero or negative value.

...

  • Default Value: 0ms
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, operation will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero value.

...

When true, HiveServer2 operation logs available for clients will be verbose. Replaced in Hive 1.2.0 by hive.server2.logging.operation.level.

hive.server2.logging.operation.level

...

HiveServer2 operation logging mode available to clients to be set at session level.

For this to work, 82903061 hive.server2.logging.operation.enabled should be set to true. The allowed values are:

...

Allows HiveServer2 to send progress bar update information. This is currently available only if the execution engine is tez.

hive.hadoop.classpath

...

The HiveServer2 WebUI SPNEGO service principal. The special string _HOST will be replaced automatically with the value of 82903061 hive.server2.webui.host or the correct host name.

...

Prior to Hive 3.1.0, you can use 82903061 hive.log.explain.output instead of this configuration property.

...

Set this to true to to display query plan as a graph instead of text in the WebUI. Only works with 82903061hive.server2.webui.explain.output set to true.

hive.server2.webui.max.graph.size

...

Max number of stages graph can display. If number of stages exceeds this, no query plan will be shown. Only works when 82903061 and 82903061hive.server2.webui.show.graph and hive.server2.webui.explain.output set to true.

hive.server2.webui.show.stats

...

Set this to true to to display statistics and log file for MapReduce tasks in the WebUI. Only works when 82903061 and 82903061hive.server2.webui.show.graph and hive.server2.webui.explain.output set to true.


Spark

Apache Spark was added in Hive 1.1.0 (HIVE-7292 and the merge-to-trunk JIRA's HIVE-9257, 9352, 9448). For information see the design document Hive on Spark and Hive on Spark: Getting Started.

To configure Hive execution to Spark, set the following property to "spark":

Besides the configuration properties listed in this section, some properties in other sections are also related to Spark:

...

If this is set to true, mapjoin optimization in Hive/Spark will use source file sizes associated with the TableScan operator on the root of the operator tree, instead of using operator statistics.

...

If this is set to true, mapjoin optimization in Hive/Spark will use statistics from TableScan operators at the root of the operator tree, instead of parent ReduceSink operators of the Join operator.

...

Time to wait to finish prewarming Spark executors when 82903061hive.prewarm.enabled is true.

Note:  These configuration properties for Hive on Spark are documented in the Tez section because they can also affect Tez:

hive.spark.optimize.shuffle.serde
  • Default Value: false
  • Added In: Hive 3.0.0 with HIVE-15104

If this is set to true, Hive on Spark will register custom serializers for data types in shuffle. This should result in less shuffled data.

hive.merge.sparkfiles
  • Default Value: false
  • Added In: Hive 1.1.0 with HIVE-7810

Merge small files at the end of a Spark DAG Transformation.

hive.spark.session.timeout.period
  • Default Value: 30 minutes
  • Added In: Hive 4.0.0 with HIVE-14162

Amount of time the Spark Remote Driver should wait for a Spark job to be submitted before shutting down. If a Spark job is not launched after this amount of time, the Spark Remote Driver will shutdown, thus releasing any resources it has been holding onto. The tradeoff is that any new Hive-on-Spark queries that run in the same session will have to wait for a new Spark Remote Driver to startup. The benefit is that for long running Hive sessions, the Spark Remote Driver doesn't unnecessarily hold onto resources. Minimum value is 30 minutes.

hive.spark.session.timeout.period
  • Default Value: 60 seconds
  • Added In: Hive 4.0.0 with HIVE-14162

How frequently to check for idle Spark sessions. Minimum value is 60 seconds.

hive.spark.use.op.stats
  • Default Value: true
  • Added in: Hive 2.3.0 with HIVE-15796

Whether to use operator stats to determine reducer parallelism for Hive on Spark. If this is false, Hive will use source table stats to determine reducer parallelism for all first level reduce tasks, and the maximum reducer parallelism from all parents for all the rest (second level and onward) reducer tasks.

Setting this to false triggers an alternative algorithm for calculating the number of partitions per Spark shuffle. This new algorithm typically results in an increased number of partitions per shuffle.

hive.spark.use.ts.stats.for.mapjoin
  • Default Value: false
  • Added in: Hive 2.3.0 with HIVE-15489

If this is set to true, mapjoin optimization in Hive/Spark will use statistics from TableScan operators at the root of operator tree, instead of parent ReduceSink operators of the Join operator. Setting this to true is useful when the operator statistics used for a common join → map join conversion are inaccurate.

hive.spark.use.groupby.shuffle
  • Default Value: true
  • Added in: Hive 2.3.0 with HIVE-15580

When set to true, use Spark's RDD#groupByKey to perform group bys. When set to false, use Spark's RDD#repartitionAndSortWithinPartitions to perform group bys. While #groupByKey has better performance when running group bys, it can use an excessive amount of memory. Setting this to false may reduce memory usage, but will hurt performance.

mapreduce.job.reduces
  • Default Value: -1 (disabled)
  • Added in: Hive 1.1.0 with HIVE-7567

Sets the number of reduce tasks for each Spark shuffle stage (e.g. the number of partitions when performing a Spark shuffle). This is set to -1 by default (disabled); instead the number of reduce tasks is dynamically calculated based on Hive data statistics. Setting this to a constant value sets the same number of partitions for all Spark shuffle stages.

...

Besides the configuration properties listed in this section, some properties in other sections are also related to Tez:

hive.jar.directory

This is the location that Hive in Tez mode will look for to find a site-wide installed Hive instance.  See 82903061 hive.user.install.directory for the default behavior.

...

If Hive (in Tez mode only) cannot find a usable Hive jar in 82903061 hive.jar.directory, it will upload the Hive jar to <hive.user.install.directory>/<user_name> and use it to run queries.

...

Whether joins can be automatically converted to bucket map joins in Hive when Tez is used as the execution engine (82903061 hive.execution.engine is set to "tez").

hive.tez.log.level

...

The log level to use for tasks executing as part of the DAG. Used only if 82903061 hive.tez.java.opts is used to configure Java options.

...

  • Default Value: 2
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to over-partition data in shuffle edges.

...

  • Default Value: 0.25
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to put a lower limit to the number of reducers that Tez specifies.

...

To turn on Hive transactions, change the values of these parameters from their defaults, as described below:

These parameters must also have non-default values to turn on Hive transactions:

Transactions

hive.txn.manager

...

Turning on Hive transactions also requires appropriate settings for 82903061, 82903061, 82903061, 82903061 hive.compactor.initiator.on, hive.compactor.worker.threads, hive.support.concurrency, hive.enforce.bucketing (Hive 0.x and 1.x only), and 82903061 hive.exec.dynamic.partition.mode.

hive.txn.strict.locking.mode

...

Whether to run the initiator and cleaner threads on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see 82903061 hive.txn.manager.

Before Hive 1.3.0 it's critical that this is enabled on exactly one metastore service instance. As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.

...

How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see 82903061hive.txn.manager.

Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.

...

Number of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually-initiated compaction succeeds, auto-initiated compactions will resume. Note that this must be less than 82903061 hive.compactor.history.retention.failed.

Indexing

Indexing was added in Hive 0.7.0 with HIVE-417, and bitmap indexing was added in Hive 0.8.0 with HIVE-1803. For more information see Indexing.

...

The Java class (implementing the StatsPublisher interface) that is used by default if 82903061 hive.stats.dbclass is not JDBC or HBase (Hive 0.12.0 and earlier), or if 82903061 hive.stats.dbclass is a custom type (Hive 0.13.0 and later:  HIVE-4632).

...

The Java class (implementing the StatsAggregator interface) that is used by default if 82903061 hive.stats.dbclass is not JDBC or HBase (Hive 0.12.0 and earlier), or if 82903061 hive.stats.dbclass is a custom type (Hive 0.13.0 and later:  HIVE-4632).

...

Subset of counters that should be of interest for hive.client.stats.publishers (when one wants to limit their publishing). Non-display names should be used.

...

  • Default Value: 24
  • Added In: Hive 0.13 with HIVE-6229

Reserved length for postfix of statistics key. Currently only meaningful for counter type statistics which should keep the length of the full statistics key smaller than the maximum length configured by 82903061 hive.stats.key.prefix.max.length. For counter type statistics, it should be bigger than the length of LB spec if exists.

hive.stats.max.variable.length
  • Default Value: 100
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics, for variable length columns (like string, bytes, etc.) this value will be used. For fixed length columns their corresponding Java equivalent sizes are used (float – 4 bytes, double – 8 bytes, etc.).

hive.analyze.stmt.collect.partlevel.stats
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7609

Prior to 0.14, on partitioned table, analyze statement used to collect table level statistics when no partition is specified. That behavior has changed beginning 0.14 to instead collect partition level statistics for all partitions. If old behavior of collecting aggregated table level statistics is desired, change the value of this config to false. This impacts only column statistics. Basic statistics are not impacted by this config.

hive.stats.list.num.entries
  • Default Value: 10
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like list, the average number of entries/values can be specified using this configuration property.

hive.stats.map.num.entries
  • Default Value: 10
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like map, the average number of entries/values can be specified using this configuration property.

hive.stats.map.parallelism

The Hive/Tez optimizer estimates the data size flowing through each of the operators. For the GROUPBY operator, to accurately compute the data size map-side parallelism needs to be known. By default, this value is set to 1 since the optimizer is not aware of the number of mappers during compile-time. This Hive configuration property can be used to specify the number of mappers for data size computation of the GROUPBY operator. (This configuration property was removed in release 0.14.0.)

hive.stats.fetch.partition.stats
  • Default Value: true
  • Added In: Hive 0.13 with HIVE-6298
  • Removed In: Hive 3.0.0 with HIVE-17932

Annotation of the operator tree with statistics information requires partition level basic statistics like number of rows, data size and file size. Partition statistics are fetched from the metastore. Fetching partition statistics for each needed partition can be expensive when the number of partitions is high. This flag can be used to disable fetching of partition statistics from the metastore. When this flag is disabled, Hive will make calls to the filesystem to get file sizes and will estimate the number of rows from the row schema.

hive.stats.fetch.column.stats
  • Default Value: false
  • Added In: Hive 0.13 with HIVE-5898

Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the metastore. Fetching column statistics for each needed column can be expensive when the number of columns is high. This flag can be used to disable fetching of column statistics from the metastore.

hive.stats.join.factor
  • Default Value: (float) 1.1
  • Added In: Hive 0.13 with HIVE-5921

The Hive/Tez optimizer estimates the data size flowing through each of the operators. The JOIN operator uses column statistics to estimate the number of rows flowing out of it and hence the data size. In the absence of column statistics, this factor determines the amount of rows flowing out of the JOIN operator.

hive.stats.deserialization.factor
  • Default Value:
    • Hive 0.13 to 2.x.x: (float) 1.0
    • Hive 3.0.0 and later: (float) 10.0
  • Added In: Hive 0.13 with HIVE-5921
  • Default value changed from 1.0 to 10.0 in Hive 3.0

The Hive/Tez optimizer estimates the data size flowing through each of the operators. In the absence of basic statistics like number of rows and data size, file size is used to estimate the number of rows and data size. Since files in tables/partitions are serialized (and optionally compressed) the estimates of number of rows and data size cannot be reliably determined. This factor is multiplied with the file size to account for serialization and compression.

hive.stats.avg.row.size
  • Default Value: 10000
  • Added In: Hive 0.13 with HIVE-5921

In the absence of table/partition statistics, average row size will be used to estimate the number of rows/data size.

...

When set to true Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the metastore. For basic statistics collection, set the configuration property 82903061 hive.stats.autogather to true. For more advanced statistics collection, run ANALYZE TABLE queries.

...

Number of threads used by partialscan/noscan analyze command for partitioned tables. This is applicable only for file formats that implement the StatsProvidingRecordReader interface (like ORC).

hive.stats.fetch.bitvector

...

Authentication and Authorization

For an overview of authorization modes, see Hive Authorization.

Anchor
Restricted/Hidden List and Whitelist
Restricted/Hidden List and Whitelist

...

Comma separated list of configuration properties which are immutable at runtime. For example, if 82903061 hive.security.authorization.enabled is set to true, it should be included in this list to prevent a client from changing it to false at runtime.

...

Whitelist for SQL Standard Based Hive Authorization

See 82903061 hive.security.authorization.sqlstd.confwhitelist below for information about the whitelist property that authorizes set commands in SQL standard based authorization.

...

For general metastore configuration properties, see 82903061 MetaStore.

hive.metastore.pre.event.listeners

...

Some parameters are added automatically when they match one of the regex specifications for the white list in HiveConf.java (for example, hive.log.trace.id in Hive 2.0.0  see HIVE-12419).

Note that the 82903061 hive.conf.restricted.list checks are still enforced after the white list check.

...

Second Java regex that the whitelist of configuration properties would match in addition to hive.security.authorization.sqlstd.confwhitelist. Do not include a starting | in the value.

Using this regex instead of updating the original regex for hive.security.authorization.sqlstd.confwhitelist means that you can append to the default that is set by SQL standard authorization instead of replacing it entirely.

...

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.lock.manager
  • Default Value: org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
  • Added In: Hive 0.7.0 with HIVE-1293

The lock manager to use when hive.support.concurrency is set to true.

hive.lock.mapred.only.operation

...

The default partition name when ZooKeeperHiveLockManager is the hive lock manager.

Metrics

The metrics that Hive collects can be viewed in the HiveServer2 Web UI. For more information, see Hive Metrics.

...

Enable metrics on the Hive Metastore Service. (For other metastore configuration properties, see the Metastore and 82903061 sections.). (For other metastore configuration properties, see the Metastore and Hive Metastore Security sections.)

hive.metastore.acidmetrics.thread.on
  • Default Value: true
  • Added in: Hive 4.0.0 with HIVE-24824

Whether to run acid related metrics collection on this metastore instance.

hive.server2.metrics.enabled

...

Enable metrics on HiveServer2. (For other HiveServer2 configuration properties, see the 82903061 HiveServer2 section.)

hive.service.metrics.class

...

  • Default Value:  "/tmp/report.json"
  • Added in: Hive 1.3.0 and 2.0.0 with HIVE-10761

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter JSON_FILE, this is the location of the local JSON metrics file dump. This file will get overwritten at every interval of hive.service.metrics.file.frequency.

hive.service.metrics.file.frequency
  • Default Value:  5 seconds
  • Added in: Hive 1.3.0 and 2.0.0 with HIVE-10761

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter JSON_FILE, this is the frequency of updating the JSON metrics file.

...

  • Default Value:  "hive"
  • Added in: Hive 2.1.0 with HIVE-13480

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter HADOOP2, this is the component name to provide to the HADOOP2 metrics system. Ideally 'hivemetastore' for the MetaStore and 'hiveserver2' for HiveServer2. The metrics will be updated at every interval of hive.service.metrics.hadoop2.frequency.

hive.service.metrics.hadoop2.frequency
  • Default Value:  30 seconds
  • Added in: Hive 2.1.0 with HIVE-13480

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter HADOOP2, this is the frequency of updating the HADOOP2 metrics system.

...

Indicates whether replication dump should include information about ACID tables. It should be used in conjunction with hive.repl.dump.metadata.only to enable copying of metadata for ACID tables which do not require the corresponding transaction semantics to be applied on target. This can be removed when ACID table replication is supported.

...

This parameter is a global variable that enables a number of optimizations when running on blobstores.
Some of the optimizations, such as 82903061 hive.blobstore.use.blobstore.as.scratchdir, won't be used if this variable is set to false.

...

Info
titleVersion information

As of Hive 0.14.0 (HIVE-7211), a configuration name that starts with "hive." is regarded as a Hive system property. With the 82903061 hive.conf.validation option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception.

...

See Hive on Tez and Hive on Spark for more information, and see the Tez section and the Spark section below for their configuration properties.

...

Maximum number of reducers that will be used. If the one specified in the configuration property 82903061 mapred.reduce.tasks is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

...

The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed, or updated) by executing the Beeline reload command without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path for creating UDFs or SerDes.

...

Hive 0.14.0 and later:  HDFS root scratch directory for Hive jobs, which gets created with write all (733) permissionFor each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/<username> is created with ${82903061hive.scratch.dir.permission}.

Also see hive.start.cleanup.scratchdir and 82903061 hive.scratchdir.lock.  When running Hive in local mode, see hive.exec.local.scratchdir.

hive.scratch.dir.permission

...

The permission for the user-specific scratch directories that get created in the root scratch directory. (See hive.exec.scratchdir.)

hive.exec.local.scratchdir

...

Scratch space for Hive jobs when Hive runs in local mode.  Also see hive.exec.scratchdir.

hive.hadoop.supports.splittable.combineinputformat

...

Whether to optimize multi group by query to generate a single M/R job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job. (This configuration property was removed in release 0.9.0.)

...

Whether to enable automatic use of indexes.

Note:  See 82903061 Indexing for more configuration properties related to Hive indexes.

...

Whether to enable predicate pushdown (PPD). 

Note: Turn on 82903061 hive.optimize.index.filter as well to use file format specific indexes with PPD.

...

How many values in each key in the map-joined table should be cached in memory.

...

How many rows with the same key value should be cached in memory per sort-merge-bucket joined table.

...

Whether a MapJoin hashtable should use optimized (size-wise) keys, allowing the table to take less memory. Depending on the key, memory savings for the entire table can be 5-15% or so.

hive.mapjoin.optimized.hashtable
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-6430 

Whether Hive should use a memory-optimized hash table for MapJoin. Only works on 82903061 Tez and 82903061 Spark, because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with HIVE-11180.)

hive.mapjoin.optimized.hashtable.wbsize
  • Default Value: 10485760 (10 * 1024 * 1024)
  • Added In: Hive 0.14.0 with HIVE-6430 

Optimized hashtable (see 82903061 hive.mapjoin.optimized.hashtable) uses a chain of buffers to store data. This is one buffer size. Hashtable may be slightly faster if this is larger, but for small joins unnecessary memory will be allocated and then trimmed.

...

Initial capacity of mapjoin hashtable if statistics are absent, or if 82903061 hive.hashtable.key.count.adjustment is set to 0.

hive.hashtable.key.count.adjustment

...

Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate of the number of keys is divided by this value. If the value is 0, statistics are not used and 82903061 hive.hashtable.initialCapacity is used instead.

hive.hashtable.loadfactor

...

Whether to enable skew join optimization.  (Also see 82903061 hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key
  • Default Value: 100000
  • Added In: Hive 0.6.0

...

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with 82903061 hive.skewjoin.mapjoin.min.split to perform a fine grained control.

...

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with 82903061 hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

...

The main difference between this paramater and 82903061 hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and 82903061 hive.optimize.skewjoin should be set to true. (Ideally, 82903061 hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and 82903061 hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove

...

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when 82903061 hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of 82903061 or 82903061 hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set 82903061 hive.merge.mapfiles to true and 82903061 hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

...

By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable.

Also see:

...

Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with 82903061 hive.exec.dynamic.partition set to true. (This configuration property was removed in release 0.13.0.)

...

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions in Hive 0.14.0 and 1.x.x. For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.enforce.sorting
  • Default Value: 
    • Hive 0.x: false
    • Hive 1.x: false
    • Hive 2.x: removed, which effectively makes it always true (HIVE-12331)
  • Added In: Hive 0.6.0

...

  • Default Value: true
  • Added In: Hive 0.11.0 with HIVE-4240

If 82903061 or 82903061 hive.enforce.bucketing or hive.enforce.sorting is true, don't create a reducer for enforcing bucketing/sorting for queries of the form:

...

where T1 and T2 are bucketed/sorted by the same keys into the same number of buckets. (In Hive 2.0.0 and later, this parameter does not depend on 82903061 or 82903061 hive.enforce.bucketing or hive.enforce.sorting.)

hive.optimize.reducededuplication

...

Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in hive.optimize.limittranspose.reductionpercentage and hive.optimize.limittranspose.reductiontuples), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too.

...

When hive.optimize.limittranspose is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

...

When hive.optimize.limittranspose is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

...

Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.exec.max.dynamic.partitions

...

  • Default Value: 134217728
  • Added In: Hive 0.7.0 with HIVE-1408

When 82903061 hive.exec.mode.local.auto is true, input bytes should be less than this for local mode.

...

  • Default Value: 4
  • Added In: Hive 0.7.0 with HIVE-1408
  • Removed In: Hive 0.9.0 with HIVE-2651

When 82903061 hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode. Replaced in Hive 0.9.0 by 82903061hive.exec.mode.local.auto.input.files.max.

hive.exec.mode.local.auto.input.files.max
  • Default Value: 4
  • Added In: Hive 0.9.0 with HIVE-2651

When 82903061 hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode.

...

To clean up the Hive scratch directory while starting the Hive server (or HiveServer2). This is not an option for a multi-user environment since it will accidentally remove the scratch directory in use.

...

Whether to enable using Column Position Alias in GROUP BY and ORDER BY clauses of queries (deprecated as of Hive 2.2.0; use 82903061 and 82903061 hive.groupby.position.alias and hive.orderby.position.alias instead).

hive.groupby.position.alias

...

Input threshold (in bytes) for applying hive.fetch.task.conversion. If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means hive.fetch.task.conversion is applied without any input length threshold.

...

...

Obsolete:  The dfs.umask value for the Hive-created folders.

...

From Hive 3.1.0 onwards, this configuration property only logs to the log4j INFO. To log the EXPLAIN EXTENDED output in WebUI / Drilldown / Query Plan from Hive 3.1.0 onwards, use 82903061 hive.server2.webui.explain.output. 

hive.explain.user
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9780

Whether to show explain result at user levelWhen enabled, will log EXPLAIN output for the query at user level. (Tez only.  For Spark, see hive.spark.explain.user.)

hive.typecheck.on.insert

...

  • Default Value: (empty)
  • Added In: Hive 0.8.1

A list of I/O exception handler class names. This is used to construct a list of exception handlers to handle exceptions thrown by record readers.

hive.input.format

The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat.

Also see:

File Formats

hive.default.fileformat

...

Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by 82903061 hive.default.fileformat. Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the StorageHandlers section for more information on managed/external and native/non-native terminology).

...

Besides the configuration properties listed in this section, some properties in other sections are also related to ORC:

hive.exec.orc.memory.pool

...

  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7509

When 82903061, 82903061 or 82903061 hive.merge.mapfiles, hive.merge.mapredfiles or hive.merge.tezfiles is enabled while writing a table with ORC file format, enabling this configuration property will do stripe-level fast merge for small ORC files. Note that enabling this configuration property will not honor the padding tolerance configuration (82903061hive.exec.orc.block.padding.tolerance).

hive.orc.row.index.stride.dictionary.check

...

This flag should be used to provide a comma separated list of fully qualified classnames to exclude certain FileInputFormats from vectorized execution using the vectorized file inputformat. Note that vectorized execution could still occur for that input format based on whether 82903061 or 82903061 hive.vectorized.use.vector.serde.deserialize or hive.vectorized.use.row.serde.deserialize is enabled or not. 

MetaStore

In addition to the Hive metastore properties listed in this section, some properties are listed in other sections:

hive.metastore.local
  • Default Value: true
  • Added In: Hive 0.8.1
  • Removed In: Hive 0.10 with HIVE-2585

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Validates existing schema against code. Turn this on if you want to verify existing schema.

...

Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.

In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if 82903061 hive.metastore.schema.verification is true.

datanucleus.schema.autoCreateAll

...

datanucleus.schema.autoCreateAll is disabled if 82903061 hive.metastore.schema.verification is true.

datanucleus.autoStartMechanismMode

...

This parameter does nothing.
Warning note: For most installations, Hive should not enable the DataNucleus L2 cache, since this can cause correctness issues. Thus, some people set this parameter to false assuming that this disables the cache – unfortunately, it does not. To actually disable the cache, set 82903061 datanucleus.cache.level2.type to "none".

datanucleus.cache.level2.type

...

Set this to true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask. (This configuration property replaced 82903061 hive.files.umask.value before Hive 0.9.0 was released) (This configuration property was removed in release 3.0.0, more details in Permission Inheritance in Hive)

...

The client-facing Kerberos service principal for the Hive metastore. If unset, it defaults to the value set for hive.metastore.kerberos.principal, for backward compatibility.

Also see hive.server2.authentication.client.kerberos.principal.

hive.metastore.cache.pinobjtypes

...

Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt (see 82903061 and 82903061 datanucleus.autoCreateSchema and datanucleus.schema.autoCreateAll). Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration.
False: Warn if the version information stored in metastore doesn't match with one from Hive jars.

...

Allow JDO query pushdown for integral partition columns in metastore. Off by default. This improves metastore performance for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (for example, if they have leading zeroes like 0012). If metastore direct SQL is enabled and works (82903061hive.metastore.try.direct.sql), this optimization is also irrelevant.

...

  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-5626

Same as 82903061 hive.metastore.try.direct.sql, for read statements within a transaction that modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL select query has incorrect syntax or something similar inside a transaction, the entire transaction will fail and fall-back to DataNucleus will not be possible. You should disable the usage of direct SQL inside transactions if that happens in your case.

...

This limits the number of partitions that can be requested from the Metastore for a given table. A query will not be executed if it attempts to fetch more partitions per table than the limit configured. A value of "-1" means unlimited. This parameter is preferred over 82903061hive.limit.query.max.table.partition (deprecated; removed in 3.0.0).

...

Besides the configuration properties listed in this section, some HiveServer2 properties are listed in other sections:

hive.server2.thrift.port
  • Default Value: 10000
  • Added In: Hive 0.11.0 with HIVE-2935

...

NONE: no authentication check – plain SASL transport
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider (use with property 82903061 hive.server2.custom.authentication.class)
PAM: Pluggable authentication module (added in Hive 0.13.0 with HIVE-6466)
NOSASL:  Raw transport (added in Hive 0.13.0) 

...

Kerberos server principal used by the HA HiveServer2. Also see hive.metastore.client.kerberos.principal.

hive.server2.custom.authentication.class

...

Custom authentication class. Used when property 82903061hive.server2.authentication is set to 'CUSTOM'. Provided class must be a proper implementation of the interface org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2 will call its Authenticate(user, passed) method to authenticate requests. The implementation may optionally extend Hadoop's org.apache.hadoop.conf.Configured class to grab Hive's Configuration object.

...

List of the underlying PAM services that should be used when 82903061 hive.server2.authentication type is PAM. A file with the same name must exist in /etc/pam.d.

...

A positive integer that determines the number of Tez sessions that should be launched on each of the queues specified by 82903061 hive.server2.tez.default.queues. Determines the parallelism on each queue.

...

  • Default Value:
    • Hive 0.x, 1.0.x, 1.1.x, 1.2.0: 0ms
    • Hive 1.2.1+, 1.3+, 2.x+: 7d (HIVE-9842
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, session will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero or negative value.

...

  • Default Value: 0ms
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, operation will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero value.

...

When true, HiveServer2 operation logs available for clients will be verbose. Replaced in Hive 1.2.0 by hive.server2.logging.operation.level.

hive.server2.logging.operation.level

...

HiveServer2 operation logging mode available to clients to be set at session level.

For this to work, 82903061 hive.server2.logging.operation.enabled should be set to true. The allowed values are:

...

Allows HiveServer2 to send progress bar update information. This is currently available only if the execution engine is tez.

hive.hadoop.classpath

...

The HiveServer2 WebUI SPNEGO service principal. The special string _HOST will be replaced automatically with the value of 82903061 hive.server2.webui.host or the correct host name.

...

Prior to Hive 3.1.0, you can use 82903061 hive.log.explain.output instead of this configuration property.

...

Set this to true to to display query plan as a graph instead of text in the WebUI. Only works with 82903061hive.server2.webui.explain.output set to true.

hive.server2.webui.max.graph.size

...

Max number of stages graph can display. If number of stages exceeds this, no query plan will be shown. Only works when 82903061 and 82903061hive.server2.webui.show.graph and hive.server2.webui.explain.output set to true.

hive.server2.webui.show.stats

...

Set this to true to to display statistics and log file for MapReduce tasks in the WebUI. Only works when 82903061 and 82903061hive.server2.webui.show.graph and hive.server2.webui.explain.output set to true.


Spark

Apache Spark was added in Hive 1.1.0 (HIVE-7292 and the merge-to-trunk JIRA's HIVE-9257, 9352, 9448). For information see the design document Hive on Spark and Hive on Spark: Getting Started.

To configure Hive execution to Spark, set the following property to "spark":

Besides the configuration properties listed in this section, some properties in other sections are also related to Spark:

...

If this is set to true, mapjoin optimization in Hive/Spark will use source file sizes associated with the TableScan operator on the root of the operator tree, instead of using operator statistics.

...

If this is set to true, mapjoin optimization in Hive/Spark will use statistics from TableScan operators at the root of the operator tree, instead of parent ReduceSink operators of the Join operator.

...

Time to wait to finish prewarming Spark executors when 82903061hive.prewarm.enabled is true.

Note:  These configuration properties for Hive on Spark are documented in the Tez section because they can also affect Tez:

hive.spark.optimize.shuffle.serde
  • Default Value: false
  • Added In: Hive 3.0.0 with HIVE-15104

If this is set to true, Hive on Spark will register custom serializers for data types in shuffle. This should result in less shuffled data.

hive.merge.sparkfiles
  • Default Value: false
  • Added In: Hive 1.1.0 with HIVE-7810

Merge small files at the end of a Spark DAG Transformation.

hive.spark.session.timeout.period
  • Default Value: 30 minutes
  • Added In: Hive 4.0.0 with HIVE-14162

Amount of time the Spark Remote Driver should wait for a Spark job to be submitted before shutting down. If a Spark job is not launched after this amount of time, the Spark Remote Driver will shutdown, thus releasing any resources it has been holding onto. The tradeoff is that any new Hive-on-Spark queries that run in the same session will have to wait for a new Spark Remote Driver to startup. The benefit is that for long running Hive sessions, the Spark Remote Driver doesn't unnecessarily hold onto resources. Minimum value is 30 minutes.

hive.spark.session.timeout.period
  • Default Value: 60 seconds
  • Added In: Hive 4.0.0 with HIVE-14162

How frequently to check for idle Spark sessions. Minimum value is 60 seconds.

hive.spark.use.op.stats
  • Default Value: true
  • Added in: Hive 2.3.0 with HIVE-15796

Whether to use operator stats to determine reducer parallelism for Hive on Spark. If this is false, Hive will use source table stats to determine reducer parallelism for all first level reduce tasks, and the maximum reducer parallelism from all parents for all the rest (second level and onward) reducer tasks.

Setting this to false triggers an alternative algorithm for calculating the number of partitions per Spark shuffle. This new algorithm typically results in an increased number of partitions per shuffle.

hive.spark.use.ts.stats.for.mapjoin
  • Default Value: false
  • Added in: Hive 2.3.0 with HIVE-15489

If this is set to true, mapjoin optimization in Hive/Spark will use statistics from TableScan operators at the root of operator tree, instead of parent ReduceSink operators of the Join operator. Setting this to true is useful when the operator statistics used for a common join → map join conversion are inaccurate.

hive.spark.use.groupby.shuffle
  • Default Value: true
  • Added in: Hive 2.3.0 with HIVE-15580

When set to true, use Spark's RDD#groupByKey to perform group bys. When set to false, use Spark's RDD#repartitionAndSortWithinPartitions to perform group bys. While #groupByKey has better performance when running group bys, it can use an excessive amount of memory. Setting this to false may reduce memory usage, but will hurt performance.

mapreduce.job.reduces
  • Default Value: -1 (disabled)
  • Added in: Hive 1.1.0 with HIVE-7567

Sets the number of reduce tasks for each Spark shuffle stage (e.g. the number of partitions when performing a Spark shuffle). This is set to -1 by default (disabled); instead the number of reduce tasks is dynamically calculated based on Hive data statistics. Setting this to a constant value sets the same number of partitions for all Spark shuffle stages.

...

Besides the configuration properties listed in this section, some properties in other sections are also related to Tez:

hive.jar.directory

This is the location that Hive in Tez mode will look for to find a site-wide installed Hive instance.  See 82903061 hive.user.install.directory for the default behavior.

...

If Hive (in Tez mode only) cannot find a usable Hive jar in 82903061 hive.jar.directory, it will upload the Hive jar to <hive.user.install.directory>/<user_name> and use it to run queries.

...

Whether joins can be automatically converted to bucket map joins in Hive when Tez is used as the execution engine (82903061 hive.execution.engine is set to "tez").

hive.tez.log.level

...

The log level to use for tasks executing as part of the DAG. Used only if 82903061 hive.tez.java.opts is used to configure Java options.

...

  • Default Value: 2
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to over-partition data in shuffle edges.

...

  • Default Value: 0.25
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to put a lower limit to the number of reducers that Tez specifies.

...

To turn on Hive transactions, change the values of these parameters from their defaults, as described below:

These parameters must also have non-default values to turn on Hive transactions:

Transactions

hive.txn.manager

...

Turning on Hive transactions also requires appropriate settings for 82903061, 82903061, 82903061, 82903061 hive.compactor.initiator.on, hive.compactor.worker.threads, hive.support.concurrency, hive.enforce.bucketing (Hive 0.x and 1.x only), and 82903061 hive.exec.dynamic.partition.mode.

hive.txn.strict.locking.mode

...

Whether to run the initiator and cleaner threads on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see 82903061 hive.txn.manager.

It's critical that this is enabled on exactly one metastore service instance (not enforced yet).

...

How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see 82903061hive.txn.manager.

Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background.

...

Number of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually-initiated compaction succeeds, auto-initiated compactions will resume. Note that this must be less than 82903061 hive.compactor.history.retention.failed.

Indexing

Indexing was added in Hive 0.7.0 with HIVE-417, and bitmap indexing was added in Hive 0.8.0 with HIVE-1803. For more information see Indexing.

...

The Java class (implementing the StatsPublisher interface) that is used by default if 82903061 hive.stats.dbclass is not JDBC or HBase (Hive 0.12.0 and earlier), or if 82903061 hive.stats.dbclass is a custom type (Hive 0.13.0 and later:  HIVE-4632).

...

The Java class (implementing the StatsAggregator interface) that is used by default if 82903061 hive.stats.dbclass is not JDBC or HBase (Hive 0.12.0 and earlier), or if 82903061 hive.stats.dbclass is a custom type (Hive 0.13.0 and later:  HIVE-4632).

...

Subset of counters that should be of interest for hive.client.stats.publishers (when one wants to limit their publishing). Non-display names should be used.

...

  • Default Value: 24
  • Added In: Hive 0.13 with HIVE-6229

Reserved length for postfix of statistics key. Currently only meaningful for counter type statistics which should keep the length of the full statistics key smaller than the maximum length configured by 82903061 hive.stats.key.prefix.max.length. For counter type statistics, it should be bigger than the length of LB spec if exists.

hive.stats.max.variable.length
  • Default Value: 100
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics, for variable length columns (like string, bytes, etc.) this value will be used. For fixed length columns their corresponding Java equivalent sizes are used (float – 4 bytes, double – 8 bytes, etc.).

hive.analyze.stmt.collect.partlevel.stats
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7609

Prior to 0.14, on partitioned table, analyze statement used to collect table level statistics when no partition is specified. That behavior has changed beginning 0.14 to instead collect partition level statistics for all partitions. If old behavior of collecting aggregated table level statistics is desired, change the value of this config to false. This impacts only column statistics. Basic statistics are not impacted by this config.

hive.stats.list.num.entries
  • Default Value: 10
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like list, the average number of entries/values can be specified using this configuration property.

hive.stats.map.num.entries
  • Default Value: 10
  • Added In: Hive 0.13 with HIVE-5369

To estimate the size of data flowing through operators in Hive/Tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator. Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like map, the average number of entries/values can be specified using this configuration property.

hive.stats.map.parallelism

The Hive/Tez optimizer estimates the data size flowing through each of the operators. For the GROUPBY operator, to accurately compute the data size map-side parallelism needs to be known. By default, this value is set to 1 since the optimizer is not aware of the number of mappers during compile-time. This Hive configuration property can be used to specify the number of mappers for data size computation of the GROUPBY operator. (This configuration property was removed in release 0.14.0.)

hive.stats.fetch.partition.stats
  • Default Value: true
  • Added In: Hive 0.13 with HIVE-6298
  • Removed In: Hive 3.0.0 with HIVE-17932

Annotation of the operator tree with statistics information requires partition level basic statistics like number of rows, data size and file size. Partition statistics are fetched from the metastore. Fetching partition statistics for each needed partition can be expensive when the number of partitions is high. This flag can be used to disable fetching of partition statistics from the metastore. When this flag is disabled, Hive will make calls to the filesystem to get file sizes and will estimate the number of rows from the row schema.

hive.stats.fetch.column.stats
  • Default Value: false
  • Added In: Hive 0.13 with HIVE-5898

Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the metastore. Fetching column statistics for each needed column can be expensive when the number of columns is high. This flag can be used to disable fetching of column statistics from the metastore.

hive.stats.join.factor
  • Default Value: (float) 1.1
  • Added In: Hive 0.13 with HIVE-5921

The Hive/Tez optimizer estimates the data size flowing through each of the operators. The JOIN operator uses column statistics to estimate the number of rows flowing out of it and hence the data size. In the absence of column statistics, this factor determines the amount of rows flowing out of the JOIN operator.

hive.stats.deserialization.factor
  • Default Value:
    • Hive 0.13 to 2.x.x: (float) 1.0
    • Hive 3.0.0 and later: (float) 10.0
  • Added In: Hive 0.13 with HIVE-5921
  • Default value changed from 1.0 to 10.0 in Hive 3.0

The Hive/Tez optimizer estimates the data size flowing through each of the operators. In the absence of basic statistics like number of rows and data size, file size is used to estimate the number of rows and data size. Since files in tables/partitions are serialized (and optionally compressed) the estimates of number of rows and data size cannot be reliably determined. This factor is multiplied with the file size to account for serialization and compression.

hive.stats.avg.row.size
  • Default Value: 10000
  • Added In: Hive 0.13 with HIVE-5921

In the absence of table/partition statistics, average row size will be used to estimate the number of rows/data size.

...

When set to true Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the metastore. For basic statistics collection, set the configuration property 82903061 hive.stats.autogather to true. For more advanced statistics collection, run ANALYZE TABLE queries.

...

Number of threads used by partialscan/noscan analyze command for partitioned tables. This is applicable only for file formats that implement the StatsProvidingRecordReader interface (like ORC).

hive.stats.fetch.bitvector

...

Authentication and Authorization

For an overview of authorization modes, see Hive Authorization.

Anchor
Restricted/Hidden List and Whitelist
Restricted/Hidden List and Whitelist

...

Comma separated list of configuration properties which are immutable at runtime. For example, if 82903061 hive.security.authorization.enabled is set to true, it should be included in this list to prevent a client from changing it to false at runtime.

...

Whitelist for SQL Standard Based Hive Authorization

See 82903061 hive.security.authorization.sqlstd.confwhitelist below for information about the whitelist property that authorizes set commands in SQL standard based authorization.

...

For general metastore configuration properties, see 82903061 MetaStore.

hive.metastore.pre.event.listeners

...

Some parameters are added automatically when they match one of the regex specifications for the white list in HiveConf.java (for example, hive.log.trace.id in Hive 2.0.0  see HIVE-12419).

Note that the 82903061 hive.conf.restricted.list checks are still enforced after the white list check.

...

Second Java regex that the whitelist of configuration properties would match in addition to hive.security.authorization.sqlstd.confwhitelist. Do not include a starting | in the value.

Using this regex instead of updating the original regex for hive.security.authorization.sqlstd.confwhitelist means that you can append to the default that is set by SQL standard authorization instead of replacing it entirely.

...

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see 82903061hive.txn.manager.

hive.lock.manager
  • Default Value: org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
  • Added In: Hive 0.7.0 with HIVE-1293

The lock manager to use when hive.support.concurrency is set to true.

hive.lock.mapred.only.operation

...

The default partition name when ZooKeeperHiveLockManager is the hive lock manager.

Metrics

The metrics that Hive collects can be viewed in the HiveServer2 Web UI. For more information, see Hive Metrics.

...

Enable metrics on the Hive Metastore Service. (For other metastore configuration properties, see the Metastore and 82903061 Hive Metastore Security sections.)

hive.server2.metrics.enabled

...

Enable metrics on HiveServer2. (For other HiveServer2 configuration properties, see the 82903061 HiveServer2 section.)

hive.service.metrics.class

...

  • Default Value:  "/tmp/report.json"
  • Added in: Hive 1.3.0 and 2.0.0 with HIVE-10761

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter JSON_FILE, this is the location of the local JSON metrics file dump. This file will get overwritten at every interval of hive.service.metrics.file.frequency.

hive.service.metrics.file.frequency
  • Default Value:  5 seconds
  • Added in: Hive 1.3.0 and 2.0.0 with HIVE-10761

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter JSON_FILE, this is the frequency of updating the JSON metrics file.

...

  • Default Value:  "hive"
  • Added in: Hive 2.1.0 with HIVE-13480

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter HADOOP2, this is the component name to provide to the HADOOP2 metrics system. Ideally 'hivemetastore' for the MetaStore and 'hiveserver2' for HiveServer2. The metrics will be updated at every interval of hive.service.metrics.hadoop2.frequency.

hive.service.metrics.hadoop2.frequency
  • Default Value:  30 seconds
  • Added in: Hive 2.1.0 with HIVE-13480

For hive.service.metrics.class org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and hive.service.metrics.reporter HADOOP2, this is the frequency of updating the HADOOP2 metrics system.

...

Indicates whether replication dump should include information about ACID tables. It should be used in conjunction with hive.repl.dump.metadata.only to enable copying of metadata for ACID tables which do not require the corresponding transaction semantics to be applied on target. This can be removed when ACID table replication is supported.

...

This parameter is a global variable that enables a number of optimizations when running on blobstores.
Some of the optimizations, such as 82903061 hive.blobstore.use.blobstore.as.scratchdir, won't be used if this variable is set to false.

...