Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Clean up page

...

Hive has become significantly faster thanks to various features and improvements that were built by the community over the past two years, including Tez and Cost-based-optimization.

Keeping the momentum, here are some examples of what we think will take us to the next level:

  • Asynchronous spindle-aware IO
  • Pre-fetching and caching of column chunks
  • Multi-threaded JIT-friendly operator pipelines

In order to achieve this we are proposing a hybrid execution model which consists of a long-lived daemon replacing direct interactions with the HDFS DataNode and a tightly integrated DAG-based framework.
Functionality such as caching, pre-fetching, some query processing and access control will move into the daemon.
Small/short queries can be largely processed by this daemon directly, while any heavy lifting will be performed in standard YARN containers.

Similar to the DataNode, LLAP daemons can be used by other applications as well, especially if a relational view on the data is preferred over file-centric processing.

We’re thus planning to open the daemon up through optional APIs (e.g.: InputFormat) that can be leveraged by other data processing frameworks as a building block.

Last, but not least, fine-grained column-level access control -- a key requirement for mainstream adoption of Hive -- fits nicely into this model.

 
 

 

Skip to end of metadata

Go to start of metadata

Hive Configuration Properties

This document describes the Hive user configuration properties (sometimes called parameters, variables, or options), and notes which releases introduced new properties.

The canonical list of configuration properties is managed in the HiveConf Java class, so refer to the HiveConf.java file for a complete list of configuration properties available in your Hive release.

For information about how to use these configuration properties, see Configuring Hive. That document also describes administrative configuration properties for setting up Hive in the Configuration Variables section. Hive Metastore Administration describes additional configuration properties for the metastore.

Version information

 

As of Hive 0.14.0 (HIVE-7211), a configuration name that starts with "hive." is regarded as a Hive system property. With the hive.conf.validation option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception.

Query and DDL Execution

hive.execution.engine

Chooses execution engine. Options are: mr (Map reduce, default), tez (Tez execution, for Hadoop 2 only), or spark (Spark execution, for Hive 1.1.0 onward).

See Hive on Tez and Hive on Spark for more information, and see the Tez section and the Spark section below for their configuration properties.

mapred.reduce.tasks
  • Default Value: -1
  • Added In: Hive 0.1.0

The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.

hive.exec.reducers.bytes.per.reducer
  • Default Value: 1,000,000,000 prior to Hive 0.14.0; 256 MB (256,000,000) in Hive 0.14.0 and later
  • Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917)

Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if the input size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later the default is 256 MB, that is, if the input size is 1 GB then 4 reducers will be used.

hive.exec.reducers.max
  • Default Value: 999 prior to Hive 0.14.0; 1009 in Hive 0.14.0 and later
  • Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917)

Maximum number of reducers that will be used. If the one specified in the configuration property mapred.reduce.tasks is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

hive.jar.path
  • Default Value: (empty)
  • Added In: Hive 0.2.0 or earlier

The location of hive_cli.jar that is used when submitting jobs in a separate jvm.

hive.aux.jars.path
  • Default Value: (empty)
  • Added In: Hive 0.2.0 or earlier

The location of the plugin jars that contain implementations of user defined functions (UDFs) and SerDes.

hive.reloadable.aux.jars.path
  • Default Value: (empty)
  • Added In: Hive 0.14.0 with HIVE-7553

Jars that can be renewed (added, removed, or updated) by executing the Beeline reload command without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path for creating UDFs or SerDes.

hive.exec.scratchdir
  • Default Value: /tmp/${user.name} in Hive 0.2.0 through 0.8.0; /tmp/hive-${user.name} in Hive 0.8.1 through 0.14.0; or /tmp/hive in Hive 0.14.0 and later
  • Added In: Hive 0.2.0; default changed in 0.8.1 and in 0.14.0 with HIVE-6847 and HIVE-8143

Scratch space for Hive jobs. This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages.

Hive 0.14.0 and later:  HDFS root scratch directory for Hive jobs, which gets created with write all (733) permissionFor each connecting user, an HDFS scratch directory ${hive.exec.scratchdir}/<username> is created with ${hive.scratch.dir.permission}.

hive.scratch.dir.permission
  • Default Value: 700
  • Added In: Hive 0.12.0 with HIVE-4487

The permission for the user-specific scratch directories that get created in the root scratch directory. (See hive.exec.scratchdir.)

hive.hadoop.supports.splittable.combineinputformat
  • Default Value: false
  • Added In: Hive 0.6.0

Whether to combine small input files so that fewer mappers are spawned.

hive.map.aggr
  • Default Value: true in Hive 0.3 and later; false in Hive 0.2
  • Added In: Hive 0.2.0

Whether to use map-side aggregation in Hive Group By queries.

hive.groupby.skewindata
  • Default Value: false
  • Added In: Hive 0.3.0

Whether there is skew in data to optimize group by queries.

hive.groupby.mapaggr.checkinterval
  • Default Value: 100000
  • Added In: Hive 0.3.0

Number of rows after which size of the grouping keys/aggregation classes is performed.

hive.new.job.grouping.set.cardinality
  • Default Value: 30
  • Added In: Hive 0.11.0 with HIVE-3552

Whether a new map-reduce job should be launched for grouping sets/rollups/cubes.

For a query like "select a, b, c, count(1) from T group by a, b, c with rollup;" four rows are created per row: (a, b, c), (a, b, null), (a, null, null), (null, null, null). This can lead to explosion across the map-reduce boundary if the cardinality of T is very high, and map-side aggregation does not do a very good job.

This parameter decides if Hive should add an additional map-reduce job. If the grouping set cardinality (4 in the example above) is more than this value, a new MR job is added under the assumption that the orginal "group by" will reduce the data size.

hive.mapred.local.mem
  • Default Value: 0
  • Added In: Hive 0.3.0

For local mode, memory of the mappers/reducers.

hive.map.aggr.hash.force.flush.memory.threshold
  • Default Value: 0.9
  • Added In: Hive 0.7.0 with HIVE-1830

The maximum memory to be used by map-side group aggregation hash table. If the memory usage is higher than this number, force to flush data.

hive.map.aggr.hash.percentmemory
  • Default Value: 0.5
  • Added In: Hive 0.2.0

Portion of total memory to be used by map-side group aggregation hash table.

hive.map.aggr.hash.min.reduction
  • Default Value: 0.5
  • Added In: Hive 0.4.0

Hash aggregation will be turned off if the ratio between hash table size and input rows is bigger than this number. Set to 1 to make sure hash aggregation is never turned off.

hive.optimize.groupby
  • Default Value: true
  • Added In: Hive 0.5.0

Whether to enable the bucketed group by from bucketed partitions/tables.

hive.multigroupby.singlemr

Whether to optimize multi group by query to generate a single M/R job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job. (This configuration property was removed in release 0.9.0.)

hive.multigroupby.singlereducer
  • Default Value: true
  • Added In: Hive 0.9.0 with HIVE-2621

Whether to optimize multi group by query to generate a single M/R  job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job.

hive.optimize.cp
  • Default Value: true
  • Added In: Hive 0.4.0 with HIVE-626
  • Removed In: Hive 0.13.0 with HIVE-4113

Whether to enable column pruner. (This configuration property was removed in release 0.13.0.)

hive.optimize.index.filter
  • Default Value: false
  • Added In: Hive 0.8.0 with HIVE-1644

Whether to enable automatic use of indexes.

Note:  See Indexing for more configuration properties related to Hive indexes.

hive.optimize.ppd
  • Default Value: true
  • Added In: Hive 0.4.0 with HIVE-279, default changed to true in Hive 0.4.0 with HIVE-626

Whether to enable predicate pushdown (PPD). 

Note: Turn on hive.optimize.index.filter as well to use file format specific indexes with PPD.

hive.optimize.ppd.storage
  • Default Value: true
  • Added In: Hive 0.7.0

Whether to push predicates down into storage handlers. Ignored when hive.optimize.ppd is false.

hive.ppd.remove.duplicatefilters
  • Default Value: true
  • Added In: Hive 0.8.0

During query optimization, filters may be pushed down in the operator tree. If this config is true, only pushed down filters remain in the operator tree, and the original filter is removed. If this config is false, the original filter is also left in the operator tree at the original place.

hive.ppd.recognizetransivity
  • Default Value: true
  • Added In: Hive 0.8.1

Whether to transitively replicate predicate filters over equijoin conditions.

hive.join.emit.interval
  • Default Value: 1000
  • Added In: Hive 0.2.0

How many rows in the right-most join operand Hive should buffer before
emitting the join result.

hive.join.cache.size
  • Default Value: 25000
  • Added In: Hive 0.5.0

How many rows in the joining tables (except the streaming table)
should be cached in memory.

hive.mapjoin.bucket.cache.size

How many values in each key in the map-joined table should be cached in memory.

hive.mapjoin.followby.map.aggr.hash.percentmemory
  • Default Value: 0.3
  • Added In: Hive 0.7.0 with HIVE-1830

Portion of total memory to be used by map-side group aggregation hash table, when this group by is followed by map join.

hive.smalltable.filesize or hive.mapjoin.smalltable.filesize
  • Default Value: 25000000
  • Added In: Hive 0.7.0 with HIVE-1642: hive.smalltable.filesize (replaced by hive.mapjoin.smalltable.filesize in Hive 0.8.1)
  • Added In: Hive 0.8.1 with HIVE-2499: hive.mapjoin.smalltable.filesize

The threshold (in bytes) for the input file size of the small tables; if the file size is smaller than this threshold, it will try to convert the common join into map join.

hive.mapjoin.localtask.max.memory.usage

This number means how much memory the local task can take to hold the key/value into an in-memory hash table. If the local task's memory usage is more than this number, the local task will be aborted. It means the data of small table is too large to be held in memory.

hive.mapjoin.followby.gby.localtask.max.memory.usage
  • Default Value: 0.55
  • Added In: Hive 0.7.0 with HIVE-1830

This number means how much memory the local task can take to hold the key/value into an in-memory hash table when this map join is followed by a group by. If the local task's memory usage is more than this number, the local task will abort by itself. It means the data of the small table is too large to be held in memory.

hive.mapjoin.check.memory.rows

The number means after how many rows processed it needs to check the memory usage.

hive.ignore.mapjoin.hint
  • Default Value: true
  • Added In: Hive 0.11.0 with HIVE-4042

Whether Hive ignores the mapjoin hint.

hive.smbjoin.cache.rows

How many rows with the same key value should be cached in memory per sort-merge-bucket joined table.

hive.mapjoin.optimized.keys

Whether a MapJoin hashtable should use optimized (size-wise) keys, allowing the table to take less memory. Depending on the key, memory savings for the entire table can be 5-15% or so.

hive.mapjoin.optimized.hashtable
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-6430 

Whether Hive should use a memory-optimized hash table for MapJoin. Only works on Tez and Spark, because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with HIVE-11180.)

hive.mapjoin.optimized.hashtable.wbsize
  • Default Value: 10485760 (10 * 1024 * 1024)
  • Added In: Hive 0.14.0 with HIVE-6430 

Optimized hashtable (see hive.mapjoin.optimized.hashtable) uses a chain of buffers to store data. This is one buffer size. Hashtable may be slightly faster if this is larger, but for small joins unnecessary memory will be allocated and then trimmed.

hive.mapjoin.lazy.hashtable

Whether a MapJoin hashtable should deserialize values on demand. Depending on how many values in the table the join will actually touch, it can save a lot of memory by not creating objects for rows that are not needed. If all rows are needed, obviously there's no gain.

hive.hashtable.initialCapacity
  • Default Value: 100000
  • Added In: Hive 0.7.0 with HIVE-1642

Initial capacity of mapjoin hashtable if statistics are absent, or if hive.hashtable.key.count.adjustment is set to 0.

hive.hashtable.key.count.adjustment
  • Default Value: 1.0
  • Added In: Hive 0.14.0 with HIVE-7616

Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate of the number of keys is divided by this value. If the value is 0, statistics are not used and hive.hashtable.initialCapacity is used instead.

hive.hashtable.loadfactor
  • Default Value: 0.75
  • Added In: Hive 0.7.0 with HIVE-1642

In the process of Mapjoin, the key/value will be held in the hashtable. This value means the load factor for the in-memory hashtable.

hive.debug.localtask
  • Default Value: false
  • Added In: Hive 0.7.0 with HIVE-1642
hive.optimize.skewjoin
  • Default Value: false
  • Added In: Hive 0.6.0

Whether to enable skew join optimization.  (Also see hive.optimize.skewjoin.compiletime.)

hive.skewjoin.key
  • Default Value: 100000
  • Added In: Hive 0.6.0

Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator, we think the key as a skew join key.

hive.skewjoin.mapjoin.map.tasks
  • Default Value: 10000
  • Added In: Hive 0.6.0

Determine the number of map task used in the follow up map join job for a skew join. It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control.

hive.skewjoin.mapjoin.min.split
  • Default Value: 33554432
  • Added In: Hive 0.6.0

Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control.

hive.optimize.skewjoin.compiletime
  • Default Value: false
  • Added In: Hive 0.10.0

Whether to create a separate plan for skewed keys for the tables in the join. This is based on the skewed keys stored in the metadata. At compile time, the plan is broken into different joins: one for the skewed keys, and the other for the remaining keys. And then, a union is performed for the two joins generated above. So unless the same skewed key is present in both the joined tables, the join for the skewed key will be performed as a map-side join.

The main difference between this paramater and hive.optimize.skewjoin is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true. (Ideally, hive.optimize.skewjoin should be renamed as hive.optimize.skewjoin.runtime, but for backward compatibility that has not been done.)

If the skew information is correctly stored in the metadata, hive.optimize.skewjoin.compiletime will change the query plan to take care of it, and hive.optimize.skewjoin will be a no-op.

hive.optimize.union.remove
  • Default Value: false
  • Added In: Hive 0.10.0 with HIVE-3276

Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when hive.optimize.skewjoin.compiletime is set to true, since an extra union is inserted.

The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively.

hive.mapred.supports.subdirectories
  • Default Value: false
  • Added In: Hive 0.10.0 with HIVE-3276

Whether the version of Hadoop which is running supports sub-directories for tables/partitions. Many Hive optimizations can be applied if the Hadoop version supports sub-directories for tables/partitions. This support was added by MAPREDUCE-1501.

hive.mapred.mode
  • Default Value: 
    • Hive 0.x: nonstrict
    • Hive 1.x: nonstrict
    • Hive 2.x: strict (HIVE-12413)
  • Added In: Hive 0.3.0

The mode in which the Hive operations are being performed. In strict mode, some risky queries are not allowed to run. For example, full table scans are prevented (see HIVE-10454) and ORDER BY requires a LIMIT clause.

hive.exec.script.maxerrsize
  • Default Value: 100000
  • Added In: Hive 0.2.0

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity.

hive.script.auto.progress
  • Default Value: false
  • Added In: Hive 0.4.0

Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.

hive.exec.script.allow.partial.consumption
  • Default Value: false
  • Added In: Hive 0.5.0

When enabled, this option allows a user script to exit successfully without consuming all the data from the standard input.

hive.script.operator.id.env.var
  • Default Value: HIVE_SCRIPT_OPERATOR_ID
  • Added In: Hive 0.5.0

Name of the environment variable that holds the unique script operator ID in the user's transform function (the custom mapper/reducer that the user has specified in the query).

hive.script.operator.env.blacklist
  • Default Value: hive.txn.valid.txns,hive.script.operator.env.blacklist
  • Added In: Hive 0.14.0 with HIVE-8341

By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable.

Also see:
  • SerDes for more hive.script.* configuration properties
hive.exec.compress.output
  • Default Value: false
  • Added In: Hive 0.2.0

This controls whether the final outputs of a query (to a local/hdfs file or a Hive table) is compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress* .

hive.exec.compress.intermediate
  • Default Value: false
  • Added In: Hive 0.2.0

This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from Hadoop configuration variables mapred.output.compress*.

hive.exec.parallel
  • Default Value: false
  • Added In: Hive 0.5.0

Whether to execute jobs in parallel.  Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join.  As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert.

hive.exec.parallel.thread.number
  • Default Value: 8
  • Added In: Hive 0.6.0

How many jobs at most can be executed in parallel.

hive.exec.rowoffset
  • Default Value: false
  • Added In: Hive 0.8.0

Whether to provide the row offset virtual column.

hive.task.progress
  • Default Value: false
  • Added In: Hive 0.5.0
  • Removed In: Hive 0.13.0 with HIVE-4518

Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with hive.exec.dynamic.partition set to true. (This configuration property was removed in release 0.13.0.)

hive.counters.group.name
  • Default Value: HIVE
  • Added In: Hive 0.13.0 with HIVE-4518

Counter group name for counters used during query execution. The counter group is used for internal Hive variables (CREATED_FILE, FATAL_ERROR, and so on).

hive.exec.pre.hooks
  • Default Value: (empty)
  • Added In: Hive 0.4.0

Comma-separated list of pre-execution hooks to be invoked for each statement. A pre-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.exec.post.hooks
  • Default Value: (empty)
  • Added In: Hive 0.5.0

Comma-separated list of post-execution hooks to be invoked for each statement. A post-execution hook is specified as the name of a Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.exec.failure.hooks
  • Default Value: (empty)
  • Added In: Hive 0.8.0

Comma-separated list of on-failure hooks to be invoked for each statement. An on-failure hook is specified as the name of Java class which implements the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.

hive.merge.mapfiles
  • Default Value: true
  • Added In: Hive 0.4.0

Merge small files at the end of a map-only job.

hive.merge.mapredfiles
  • Default Value: false
  • Added In: Hive 0.4.0

Merge small files at the end of a map-reduce job.

hive.mergejob.maponly
  • Default Value: true
  • Added In: Hive 0.6.0
  • Removed In: Hive 0.11.0

Try to generate a map-only job for merging files if CombineHiveInputFormat is supported. (This configuration property was removed in release 0.11.0.)

hive.merge.size.per.task
  • Default Value: 256000000
  • Added In: Hive 0.4.0

Size of merged files at the end of the job.

hive.merge.smallfiles.avgsize
  • Default Value: 16000000
  • Added In: Hive 0.5.0

When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

hive.heartbeat.interval
  • Default Value: 1000
  • Added In: Hive 0.4.0

Send a heartbeat after this interval – used by mapjoin and filter operators.

hive.auto.convert.join
  • Default Value: false in 0.7.0 to 0.10.0; true in 0.11.0 and later (HIVE-3297)  
  • Added In: 0.7.0 with HIVE-1642

Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. (Note that hive-default.xml.template incorrectly gives the default as false in Hive 0.11.0 through 0.13.1.)

hive.auto.convert.join.noconditionaltask
  • Default Value: true
  • Added In: 0.11.0 with HIVE-3784 (default changed to true with HIVE-4146)

Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for an n-way join is smaller than the size specified by hive.auto.convert.join.noconditionaltask.size, the join is directly converted to a mapjoin (there is no conditional task).

hive.auto.convert.join.noconditionaltask.size
  • Default Value: 10000000
  • Added In: 0.11.0 with HIVE-3784

If hive.auto.convert.join.noconditionaltask is off, this parameter does not take effect. However, if it is on, and the sum of size for n-1 of the tables/partitions for an n-way join is smaller than this size, the join is directly converted to a mapjoin (there is no conditional task). The default is 10MB.

hive.auto.convert.join.use.nonstaged
  • Default Value: false
  • Added In: 0.13.0 with HIVE-6144 (default originally true, but changed to false with HIVE-6749 also in 0.13.0)

For conditional joins, if input stream from a small alias can be directly applied to the join operator without filtering or projection, the alias need not be pre-staged in the distributed cache via a mapred local task. Currently, this is not working with vectorization or Tez execution engine.

hive.udtf.auto.progress
  • Default Value: false
  • Added In: Hive 0.5.0

Whether Hive should automatically send progress information to TaskTracker when using UDTF's to prevent the task getting killed because of inactivity. Users should be cautious because this may prevent TaskTracker from killing tasks with infinite loops.

hive.mapred.reduce.tasks.speculative.execution
  • Default Value: true
  • Added In: Hive 0.5.0

Whether speculative execution for reducers should be turned on.

hive.exec.counters.pull.interval
  • Default Value: 1000
  • Added In: Hive 0.6.0

The interval with which to poll the JobTracker for the counters the running job. The smaller it is the more load there will be on the jobtracker, the higher it is the less granular the caught will be.

hive.enforce.bucketing
  • Default Value: 
    • Hive 0.x: false
    • Hive 1.x: false
    • Hive 2.x: removed, which effectively makes it always true (HIVE-12331)
  • Added In: Hive 0.6.0

Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced.

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.

hive.enforce.sorting
  • Default Value: 
    • Hive 0.x: false
    • Hive 1.x: false
    • Hive 2.x: removed, which effectively makes it always true (HIVE-12331)
  • Added In: Hive 0.6.0

Whether sorting is enforced. If true, while inserting into the table, sorting is enforced.

hive.optimize.reducededuplication
  • Default Value: true
  • Added In: Hive 0.6.0

Remove extra map-reduce jobs if the data is already clustered by the same key which needs to be used again. This should always be set to true. Since it is a new feature, it has been made configurable.

hive.optimize.reducededuplication.min.reducer
  • Default Value: 4
  • Added In: Hive 0.11.0 with HIVE-2340

Reduce deduplication merges two RSs (reduce sink operators) by moving key/parts/reducer-num of the child RS to parent RS. That means if reducer-num of the child RS is fixed (order by or forced bucketing) and small, it can make very slow, single MR. The optimization will be disabled if number of reducers is less than specified value.

hive.optimize.correlation
  • Default Value: false
  • Added In: Hive 0.12.0 with HIVE-2206

Exploit intra-query correlations. For details see the Correlation Optimizer design document.

hive.optimize.limittranspose

Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in hive.optimize.limittranspose.reductionpercentage and hive.optimize.limittranspose.reductiontuples), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too.

hive.optimize.limittranspose.reductionpercentage

When hive.optimize.limittranspose is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

hive.optimize.limittranspose.reductiontuples

When hive.optimize.limittranspose is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule.

hive.optimize.sort.dynamic.partition
  • Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
  • Added In: Hive 0.13.0 with HIVE-6455

When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

hive.cbo.enable

When true, the cost based optimizer, which uses the Calcite framework, will be enabled.

hive.optimize.null.scan
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7385

When true, this optimization will try to not scan any rows from tables which can be determined at query compile time to not generate any rows (e.g., where 1 = 2, where false, limit 0 etc.).

hive.exec.dynamic.partition
  • Default Value: false
  • Added In: Hive 0.6.0

Whether or not to allow dynamic partitions in DML/DDL.

hive.exec.dynamic.partition.mode
  • Default Value: strict
  • Added In: Hive 0.6.0

In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. Inonstrict mode all partitions are allowed to be dynamic.

Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.

hive.exec.max.dynamic.partitions
  • Default Value: 1000
  • Added In: Hive 0.6.0

Maximum number of dynamic partitions allowed to be created in total.

hive.exec.max.dynamic.partitions.pernode
  • Default Value: 100
  • Added In: Hive 0.6.0

Maximum number of dynamic partitions allowed to be created in each mapper/reducer node.

hive.exec.max.created.files
  • Default Value: 100000
  • Added In: Hive 0.7.0

Maximum number of HDFS files created by all mappers/reducers in a MapReduce job.

hive.exec.default.partition.name
  • Default Value: _HIVE_DEFAULT_PARTITION_
  • Added In: Hive 0.6.0

The default partition name in case the dynamic partition column value is null/empty string or any other values that cannot be escaped. This value must not contain any special character used in HDFS URI (e.g., ':', '%', '/' etc). The user has to be aware that the dynamic partition value should not contain this value to avoid confusions.

hive.fetch.output.serde
  • Default Value: org.apache.hadoop.hive.serde2.DelimitedJSONSerDe
  • Added In: Hive 0.7.0

The SerDe used by FetchTask to serialize the fetch output.

hive.exec.mode.local.auto
  • Default Value: false
  • Added In: Hive 0.7.0 with HIVE-1408

Lets Hive determine whether to run in local mode automatically.

hive.exec.mode.local.auto.inputbytes.max
  • Default Value: 134217728
  • Added In: Hive 0.7.0 with HIVE-1408

When hive.exec.mode.local.auto is true, input bytes should be less than this for local mode.

hive.exec.mode.local.auto.tasks.max
  • Default Value: 4
  • Added In: Hive 0.7.0 with HIVE-1408
  • Removed In: Hive 0.9.0 with HIVE-2651

When hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode. Replaced in Hive 0.9.0 by hive.exec.mode.local.auto.input.files.max.

hive.exec.mode.local.auto.input.files.max
  • Default Value: 4
  • Added In: Hive 0.9.0 with HIVE-2651

When hive.exec.mode.local.auto is true, the number of tasks should be less than this for local mode.

hive.exec.drop.ignorenonexistent

Do not report an error if DROP TABLE/VIEW/PARTITION/INDEX/TEMPORARY FUNCTION specifies a non-existent table/view. Also applies to permanent functions as of Hive 0.13.0.

hive.exec.show.job.failure.debug.info
  • Default Value: true
  • Added In: Hive 0.7.0

If a job fails, whether to provide a link in the CLI to the task with the most failures, along with debugging hints if applicable.

hive.auto.progress.timeout
  • Default Value: 0
  • Added In: Hive 0.7.0

How long to run autoprogressor for the script/UDTF operators (in seconds). Set to 0 for forever.

hive.table.parameters.default
  • Default Value: (empty)
  • Added In: Hive 0.7.0

Default property values for newly created tables.

hive.variable.substitute
  • Default Value: true
  • Added In: Hive 0.7.0

This enables substitution using syntax like ${var} ${system:var} and ${env:var}.

hive.error.on.empty.partition
  • Default Value: false
  • Added In: Hive 0.7.0

Whether to throw an exception if dynamic partition insert generates empty results.

hive.exim.uri.scheme.whitelist
  • Default Value: hdfs,pfile
  • Added In: Hive 0.8.0

A comma separated list of acceptable URI schemes for import and export.

hive.limit.row.max.size
  • Default Value: 100000
  • Added In: Hive 0.8.0

When trying a smaller subset of data for simple LIMIT, how much size we need to guarantee each row to have at least.

hive.limit.optimize.limit.file
  • Default Value: 10
  • Added In: Hive 0.8.0

When trying a smaller subset of data for simple LIMIT, maximum number of files we can sample.

hive.limit.optimize.enable
  • Default Value: false
  • Added In: Hive 0.8.0

Whether to enable to optimization to trying a smaller subset of data for simple LIMIT first.

hive.limit.optimize.fetch.max
  • Default Value: 50000
  • Added In: Hive 0.8.0

Maximum number of rows allowed for a smaller subset of data for simple LIMIT, if it is a fetch query. Insert queries are not restricted by this limit.

hive.rework.mapredwork
  • Default Value: false
  • Added In: Hive 0.8.0

Should rework the mapred work or not. This is first introduced by SymlinkTextInputFormat to replace symlink files with real paths at compile time.

hive.sample.seednumber
  • Default Value: 0
  • Added In: Hive 0.8.0

A number used to percentage sampling. By changing this number, user will change the subsets of data sampled.

hive.autogen.columnalias.prefix.label
  • Default Value: _c
  • Added In: Hive 0.8.0

String used as a prefix when auto generating column alias. By default the prefix label will be appended with a column position number to form the column alias. Auto generation would happen if an aggregate function is used in a select clause without an explicit alias.

hive.autogen.columnalias.prefix.includefuncname
  • Default Value: false
  • Added In: Hive 0.8.0

Whether to include function name in the column alias auto generated by Hive.

hive.exec.perf.logger
  • Default Value: org.apache.hadoop.hive.ql.log.PerfLogger
  • Added In: Hive 0.8.0

The class responsible logging client side performance metrics. Must be a subclass of org.apache.hadoop.hive.ql.log.PerfLogger.

hive.start.cleanup.scratchdir

To cleanup the Hive scratch directory while starting the Hive server.

hive.output.file.extension
  • Default Value: (empty)
  • Added In: Hive 0.8.1

String used as a file extension for output files. If not set, defaults to the codec extension for text files (e.g. ".gz"), or no extension otherwise.

hive.insert.into.multilevel.dirs
  • Default Value: false
  • Added In: Hive 0.8.1

Where to insert into multilevel directories like "insert directory '/HIVEFT25686/chinna/' from table".

hive.conf.validation
  • Default Value: true
  • Added In: Hive 0.10.0 with HIVE-2848

Enables type checking for registered Hive configurations.

As of Hive 0.14.0 (HIVE-7211), a configuration name that starts with "hive." is regarded as a Hive system property. With hive.conf.validation true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception.

hive.fetch.task.conversion
  • Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and later
  • Added In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397

Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.

Supported values are none, minimal and more.

0. none:  Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal:  SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more:  SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)

"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)

hive.groupby.orderby.position.alias
  • Default Value: false
  • Added In: Hive 0.11.0 with HIVE-581

Whether to enable using Column Position Alias in GROUP BY and ORDER BY clauses of queries.

hive.fetch.task.aggr
  • Default Value: false
  • Added In: Hive 0.12.0 with HIVE-4002 (description added in Hive 0.13.0 with HIVE-5793)

Aggregation queries with no group-by clause (for example, select count(*) from src) execute final aggregations in a single reduce task. If this parameter is set to true, Hive delegates the final aggregation stage to a fetch task, possibly decreasing the query time.

hive.fetch.task.conversion.threshold
  • Default Value: -1 in Hive 0.13.0 and 0.13.1, 1073741824 (1 GB) in Hive 0.14.0 and later 
  • Added In: Hive 0.13.0 with HIVE-3990; default changed in Hive 0.14.0 with HIVE-7397

Input threshold (in bytes) for applying hive.fetch.task.conversion. If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means hive.fetch.task.conversion is applied without any input length threshold.

hive.limit.pushdown.memory.usage
  • Default Value: -1
  • Added In: Hive 0.12.0 with HIVE-3562

The maximum memory to be used for hash in RS operator for top K selection. The default value "-1" means no limit.

hive.cache.expr.evaluation
  • Default Value: true
  • Added In: Hive 0.12.0 with HIVE-4209
  • Bug Fix: Hive 0.14.0 with HIVE-7314 (expression caching doesn't work when using UDF inside another UDF or a Hive function)

If true, the evaluation result of a deterministic expression referenced twice or more will be cached. For example, in a filter condition like "... where key + 10 > 10 or key + 10 = 0" the expression "key + 10" will be evaluated/cached once and reused for the following expression ("key + 10 = 0"). Currently, this is applied only to expressions in select or filter operators.

hive.resultset.use.unique.column.names
  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-6687

Make column names unique in the result set by qualifying column names with table alias if needed. Table alias will be added to column names for queries of type "select *" or if query explicitly uses table alias "select r1.x..".

hive.support.quoted.identifiers
  • Default Value: column
  • Added In: Hive 0.13.0 with HIVE-6013

Whether to use quoted identifiers.  Value can be "none" or "column".

column:  Column names can contain any Unicode character. Any column name that is specified within backticks (`) is treated literally. Within a backtick string, use double backticks (``) to represent a backtick character.
none:  Only alphanumeric and underscore characters are valid in identifiers. Backticked names are interpreted as regular expressions. This is also the behavior in releases prior to 0.13.0.

hive.plan.serialization.format
  • Default Value: kryo
  • Added In: Hive 0.13.0 with HIVE-1511

Query plan format serialization between client and task nodes. Two supported values are kryo and javaXML. Kryo is the default.

hive.exec.check.crossproducts
  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-6643

Check if a query plan contains a cross product. If there is one, output a warning to the session's console.

hive.display.partition.cols.separately
  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-6689

In older Hive versions (0.10 and earlier) no distinction was made between partition columns or non-partition columns while displaying columns in DESCRIBE TABLE. From version 0.12 onwards, they are displayed separately. This flag will let you get the old behavior, if desired. See test-case in patch for HIVE-6689.

hive.limit.query.max.table.partition
  • Default Value: -1
  • Added In: Hive 0.13.0 with HIVE-6492

To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries.

hive.files.umask.value

Obsolete:  The dfs.umask value for the Hive-created folders.

hive.optimize.sampling.orderby
  • Default Value: false
  • Added In: Hive 0.12.0 with HIVE-1402

Uses sampling on order-by clause for parallel execution.

hive.optimize.sampling.orderby.number
  • Default Value: 1000
  • Added In: Hive 0.12.0 with HIVE-1402

With hive.optimize.sampling.orderby=true, total number of samples to be obtained to calculate partition keys.

hive.optimize.sampling.orderby.percent
  • Default Value: 0.1
  • Added In: Hive 0.12.0 with HIVE-1402

With hive.optimize.sampling.orderby=true, probability with which a row will be chosen.

hive.compat
  • Default Value: 0.12
  • Added In: Hive 0.13.0 with HIVE-6012

Enable (configurable) deprecated behaviors of arithmetic operations by setting the desired level of backward compatibility. The default value gives backward-compatible return types for numeric operations. Other supported release numbers give newer behavior for numeric operations, for example 0.13 gives the more SQL compliant return types introduced in HIVE-5356.

The value "latest" specifies the latest supported level. Currently, this only affects division of integers.

Setting to 0.12 (default) maintains division behavior in Hive 0.12 and earlier releases: int / int = double.
Setting to 0.13 gives division behavior in Hive 0.13 and later releases: int / int = decimal.

An invalid setting will cause an error message, and the default support level will be used.

hive.optimize.constant.propagation
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-5771

Whether to enable the constant propagation optimizer.

hive.entity.capture.transform

Enable capturing compiler read entity of transform URI which can be introspected in the semantic and exec hooks.

hive.support.sql11.reserved.keywords
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-6617

Whether to enable support for SQL2011 reserved keywords. When enabled, will support (part of) SQL2011 reserved keywords.

hive.explain.user
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9780

Whether to show explain result at user levelWhen enabled, will log EXPLAIN output for the query at user level.

hive.typecheck.on.insert
  • Default Value: true
  • Added In: Hive 0.12.0 with HIVE-5297 for insert partition
  • Extended In: Hive 1.2 with HIVE-10307 for alter, describe partition, etc.

Whether to check, convert, and normalize partition value specified in partition specification to conform to the partition column type.

hive.exec.temporary.table.storage
  • Default Value: default

  • Added In: Hive 1.1.0 with HIVE-7313

Expects one of [memory, ssd, default].

Define the storage policy for temporary tables. Choices between memory, ssd and default. See HDFS Storage Types and Storage Policies.

hive.optimize.distinct.rewrite
  • Default Value: true

  • Added In: Hive 1.2.0 with HIVE-10568

When applicable, this optimization rewrites distinct aggregates from a single-stage to multi-stage aggregation. This may not be optimal in all cases. Ideally, whether to trigger it or not should be a cost-based decision. Until Hive formalizes the cost model for this, this is config driven.

hive.optimize.point.lookup
  • Default Value: true

  • Added In: Hive 2.0.0 with HIVE-11461

Whether to transform OR clauses in Filter operators into IN clauses.

hive.optimize.point.lookup.min
  • Default Value: 31

  • Added In: Hive 2.0.0 with HIVE-11573

Minimum number of OR clauses needed to transform into IN clauses.

SerDes and I/O

SerDes

hive.script.serde
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  • Added In: Hive 0.4.0

The default SerDe for transmitting input data to and reading output data from the user scripts.

hive.script.recordreader
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordReader
  • Added In: Hive 0.4.0

The default record reader for reading data from the user scripts.

hive.script.recordwriter
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordWriter
  • Added In: Hive 0.5.0

The default record writer for writing data to the user scripts.

hive.default.serde
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  • Added in: Hive 0.14 with HIVE-5976

The default SerDe Hive will use for storage formats that do not specify a SerDe.  Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.  

See Registration of Native SerDes for more information for storage formats and SerDes.

hive.lazysimple.extended_boolean_literal
  • Default Value: false
  • Added in: Hive 0.14 with HIVE-3635

LazySimpleSerDe uses this property to determine if it treats 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'. The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.

I/O

hive.io.exception.handlers
  • Default Value: (empty)
  • Added In: Hive 0.8.1

A list of I/O exception handler class names. This is used to construct a list of exception handlers to handle exceptions thrown by record readers.

hive.input.format

The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat.

Also see:

File Formats

hive.default.fileformat
  • Default Value: TextFile
  • Added In: Hive 0.2.0

Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile, RCfile, and ORC.

Users can explicitly say CREATE TABLE ... STORED AS TEXTFILE|SEQUENCEFILE|RCFILE|ORC|AVRO|INPUTFORMAT...OUTPUTFORMAT... to override. (RCFILE was added in Hive 0.6.0, ORC in 0.11.0, and AVRO in 0.14.0.) See Row Format, Storage Format, and SerDe for details.

hive.fileformat.check
  • Default Value: true
  • Added In: Hive 0.5.0

Whether to check file format or not when loading data files.

hive.query.result.fileformat
  • Default Value:
    • Hive 0.x, 1.x, and 2.0: TextFile
    • Hive 2.1 onward: SequenceFile
  • Added In: Hive 0.7.0 with HIVE-1598

File format to use for a query's intermediate results. Options are TextFile, SequenceFile, and RCfile. Default value is changed to SequenceFile since Hive 2.1.0 (HIVE-1608).

RCFile Format

hive.io.rcfile.record.interval
  • Default Value: 2147483647
  • Added In: Hive 0.4.0 with HIVE-352; added to HiveConf.java in Hive 0.14.0 with HIVE-7211
hive.io.rcfile.column.number.conf
  • Default Value: 0
  • Added In: Hive 0.4.0 with HIVE-352; added to HiveConf.java in Hive 0.14.0 with HIVE-7211
hive.io.rcfile.tolerate.corruptions
  • Default Value: false
  • Added In: Hive 0.4.0 with HIVE-352; added to HiveConf.java in Hive 0.14.0 with HIVE-7211
hive.io.rcfile.record.buffer.size
  • Default Value: 4194304
  • Added In: Hive 0.4.0 with HIVE-352; added to HiveConf.java in Hive 0.14.0 with HIVE-7211

ORC File Format

The ORC file format was introduced in Hive 0.11.0. See ORC Files for details.

Besides the configuration properties listed in this section, some properties in other sections are also related to ORC:

hive.exec.orc.memory.pool
  • Default Value: 0.5
  • Added In: Hive 0.11.0 with HIVE-4248

Maximum fraction of heap that can be used by ORC file writers.

hive.exec.orc.write.format
  • Default Value: (empty)
  • Added In: Hive 0.12.0 with HIVE-4123; default changed from 0.11 to null with HIVE-5091 (also in Hive 0.12.0)

Define the version of the file to write. Possible values are 0.11 and 0.12. If this parameter is not defined, ORC will use the run length encoding (RLE) introduced in Hive 0.12. Any value other than 0.11 results in the 0.12 encoding.

Additional values may be introduced in the future (see HIVE-6002).

hive.exec.orc.default.stripe.size
  • Default Value: 256*1024*1024 (268,435,456) in 0.13.0;
                             64*1024*1024 (67,108,864) in 0.14.0
  • Added In: Hive 0.13.0 with HIVE-5425; default changed in 0.14.0 with HIVE-7231 and HIVE-7490

Define the default ORC stripe size, in bytes.

hive.exec.orc.default.block.size
  • Default Value: 256*1024*1024 (268,435,456)
  • Added In: Hive 0.14.0 with HIVE-7231

Define the default file system block size for ORC files.

hive.exec.orc.dictionary.key.size.threshold
  • Default Value: 0.8
  • Added In: Hive 0.12.0 with HIVE-4324

If the number of keys in a dictionary is greater than this fraction of the total number of non-null rows, turn off dictionary encoding.  Use 1 to always use dictionary encoding.

hive.exec.orc.default.row.index.stride
  • Default Value: 10000
  • Added In: Hive 0.13.0 with HIVE-5728

Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.)

hive.exec.orc.default.buffer.size
  • Default Value: 256*1024 (262,144)
  • Added In: Hive 0.13.0 with HIVE-5728

Define the default ORC buffer size, in bytes.

hive.exec.orc.default.block.padding
  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-5728

Define the default block padding. Block padding was added in Hive 0.12.0 (HIVE-5091, "ORC files should have an option to pad stripes to the HDFS block boundaries").

hive.exec.orc.block.padding.tolerance
  • Default Value: 0.05
  • Added In: Hive 0.14.0 with HIVE-7231

Define the tolerance for block padding as a decimal fraction of stripe size (for example, the default value 0.05 is 5% of the stripe size). For the defaults of 64Mb ORC stripe and 256Mb HDFS blocks, a maximum of 3.2Mb will be reserved for padding within the 256Mb block with the default hive.exec.orc.block.padding.tolerance. In that case, if the available size within the block is more than 3.2Mb, a new smaller stripe will be inserted to fit within that space. This will make sure that no stripe written will cross block boundaries and cause remote reads within a node local task.

hive.exec.orc.default.compress
  • Default Value: ZLIB
  • Added In: Hive 0.13.0 with HIVE-5728

Define the default compression codec for ORC file.

hive.exec.orc.encoding.strategy
  • Default Value: SPEED
  • Added In: Hive 0.14.0 with HIVE-7219

Define the encoding strategy to use while writing data. Changing this will only affect the light weight encoding for integers. This flag will not change the compression level of higher level compression codec (like ZLIB). Possible options are SPEED and COMPRESSION.

hive.orc.splits.include.file.footer

If turned on, splits generated by ORC will include metadata about the stripes in the file. This data is read remotely (from the client or HiveServer2 machine) and sent to all the tasks.

hive.orc.cache.stripe.details.size

Cache size for keeping meta information about ORC splits cached in the client.

hive.orc.compute.splits.num.threads

How many threads ORC should use to create splits in parallel.

hive.exec.orc.split.strategy
  • Default Value: HYBRID
  • Added In: Hive 1.2.0 with HIVE-10114

...

hive.exec.orc.skip.corrupt.data
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-6382

If ORC reader encounters corrupt data, this value will be used to determine whether to skip the corrupt data or throw an exception. The default behavior is to throw an exception.

hive.exec.orc.zerocopy

Use zerocopy reads with ORC. (This requires Hadoop 2.3 or later.)

hive.merge.orcfile.stripe.level
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7509

When hive.merge.mapfiles, hive.merge.mapredfiles or hive.merge.tezfiles is enabled while writing a table with ORC file format, enabling this configuration property will do stripe-level fast merge for small ORC files. Note that enabling this configuration property will not honor the padding tolerance configuration (hive.exec.orc.block.padding.tolerance).

hive.orc.row.index.stride.dictionary.check
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7832

If enabled dictionary check will happen after first row index stride (default 10000 rows) else dictionary check will happen before writing first stripe. In both cases, the decision to use dictionary or not will be retained thereafter.

hive.exec.orc.compression.strategy
  • Default Value: SPEED
  • Added In: Hive 0.14.0 with HIVE-7859

Define the compression strategy to use while writing data. This changes the compression level of higher level compression codec (like ZLIB).

Value can be SPEED or COMPRESSION.

Parquet

Parquet is supported by a plugin in Hive 0.10, 0.11, and 0.12 and natively in Hive 0.13 and later. See Parquet for details.

hive.parquet.timestamp.skip.conversion
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-9482

Current Hive implementation of Parquet stores timestamps in UTC on-file, this flag allows skipping of the conversion on reading Parquet files created from other tools that may not have done so.

Vectorization

Hive added vectorized query execution in release 0.13.0 (HIVE-4160, HIVE-5283). For more information see the design document Vectorized Query Execution.

hive.vectorized.execution.enabled
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-5283

This flag should be set to true to enable vectorized mode of query execution. The default value is false.

hive.vectorized.execution.reduce.enabled
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-7405

This flag should be set to true to enable vectorized mode of the reduce-side of query execution. The default value is true.

hive.vectorized.execution.reduce.groupby.enabled
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-8052

This flag should be set to true to enable vectorized mode of the reduce-side GROUP BY query execution. The default value is true.

hive.vectorized.execution.mapjoin.native.enabled
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-9824

This flag should be set to true to enable native (i.e. non-pass through) vectorization of queries using MapJoin.

hive.vectorized.execution.mapjoin.native.multikey.only.enabled
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9824

This flag should be set to true to restrict use of native vector map join hash tables to the MultiKey in queries using MapJoin.

hive.vectorized.execution.mapjoin.minmax.enabled
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9824

This flag should be set to true to enable vector map join hash tables to use max / max filtering for integer join queries using MapJoin.

hive.vectorized.execution.mapjoin.overflow.repeated.threshold
  • Default Value: -1
  • Added In: Hive 1.2.0 with HIVE-9824

The number of small table rows for a match in vector map join hash tables where we use the repeated field optimization in overflow vectorized row batch for join queries using MapJoin. A value of -1 means do use the join result optimization. Otherwise, threshold value can be 0 to maximum integer.

hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled
  • Default Value: false
  • Added In: Hive 1.2.0 with HIVE-9824

This flag should be set to true to enable use of native fast vector map join hash tables in queries using MapJoin.

hive.vectorized.groupby.checkinterval
  • Default Value: 100000
  • Added In: Hive 0.13.0 with HIVE-5692

Number of entries added to the GROUP BY aggregation hash before a recomputation of average entry size is performed.

hive.vectorized.groupby.maxentries
  • Default Value: 1000000
  • Added In: Hive 0.13.0 with HIVE-5692

Maximum number of entries in the vector GROUP BY aggregation hashtables. Exceeding this will trigger a flush regardless of memory pressure condition.

hive.vectorized.groupby.flush.percent
  • Default Value: 0.1
  • Added In: Hive 0.13.0 with HIVE-5692

Percent (as decimal fraction) of entries in the GROUP BY aggregation hash flushed when the memory threshold is exceeded.

MetaStore

In addition to the Hive metastore properties listed in this section, some properties are listed in other sections:

hive.metastore.local
  • Default Value: true
  • Added In: Hive 0.8.1
  • Removed In: Hive 0.10 with HIVE-2585

Controls whether to connect to remote metastore server or open a new metastore server in Hive Client JVM. As of Hive 0.10 this is no longer used. Instead if hive.metastore.uris is set then remote mode is assumed otherwise local.

javax.jdo.option.ConnectionURL
  • Default Value: jdbc:derby:;databaseName=metastore_db;create=true
  • Added In: Hive 0.6.0

JDBC connect string for a JDBC metastore.

javax.jdo.option.ConnectionDriverName
  • Default Value: org.apache.derby.jdbc.EmbeddedDriver
  • Added In: Hive 0.8.1

Driver class name for a JDBC metastore.

javax.jdo.PersistenceManagerFactoryClass
  • Default Value: org.datanucleus.jdo.JDOPersistenceManagerFactory
  • Added In: Hive 0.8.1

Class implementing the JDO PersistenceManagerFactory.

javax.jdo.option.DetachAllOnCommit
  • Default Value: true
  • Added In: Hive 0.8.1

Detaches all objects from session so that they can be used after transaction is committed.

javax.jdo.option.NonTransactionalRead
  • Default Value: true
  • Added In: Hive 0.8.1

Reads outside of transactions.

javax.jdo.option.ConnectionUserName
  • Default Value: APP
  • Added In: Hive 0.8.1

Username to use against metastore database.

javax.jdo.option.ConnectionPassword
  • Default Value: mine
  • Added In: Hive 0.3.0

Password to use against metastore database.

For an alternative configuration, see Removing Hive Metastore Password from Hive Configuration.

javax.jdo.option.Multithreaded
  • Default Value: true
  • Added In: Hive 0.8.0

Set this to true if multiple threads access metastore through JDO concurrently.

datanucleus.connectionPoolingType
  • Default Value: DBCP in Hive 0.7 to 0.11; BoneCP in 0.12 and later 
  • Added In: Hive 0.7.0

Uses a BoneCP connection pool for JDBC metastore in release 0.12 and later (HIVE-4807), or a DBCP connection pool in releases 0.7 to 0.11.

datanucleus.validateTables

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.schema.validateTables

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.validateColumns

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.schema.validateColumns

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.validateConstraints

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.schema.validateConstraints

Validates existing schema against code. Turn this on if you want to verify existing schema.

datanucleus.storeManagerType
  • Default Value: rdbms
  • Added In: Hive 0.7.0

Metadata store type.

datanucleus.fixedDatastore
  • Default Value: 
    • Hive 0.x: false
    • Hive 1.x: false
  • Added In: Hive 0.12.0 with HIVE-3764
  • Removed In: Hive 2.0.0 with HIVE-6113

Dictates whether to allow updates to schema or not.

datanucleus.autoCreateSchema

Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.

In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if hive.metastore.schema.verification is true.

datanucleus.schema.autoCreateAll

Creates necessary schema on a startup if one does not exist. Reset this to false, after creating it once.

datanucleus.schema.autoCreateAll is disabled if hive.metastore.schema.verification is true.

datanucleus.autoStartMechanismMode
  • Default Value: checked
  • Added In: Hive 0.7.0

Throw exception if metadata tables are incorrect.

datanucleus.transactionIsolation
  • Default Value: read-committed
  • Added In: Hive 0.7.0

Default transaction isolation level for identity generation.

datanucleus.cache.level2
  • Default Value: false
  • Added In: Hive 0.7.0

This parameter does nothing.
Warning note: For most installations, Hive should not enable the DataNucleus L2 cache, since this can cause correctness issues. Thus, some people set this parameter to false assuming that this disables the cache – unfortunately, it does not. To actually disable the cache, set datanucleus.cache.level2.type to "none".

datanucleus.cache.level2.type
  • Default Value: none in Hive 0.9 and later; SOFT in Hive 0.7 to 0.8.1
  • Added In: Hive 0.7.0

NONE = disable the datanucleus level 2 cache, SOFT = soft reference based cache, WEAK = weak reference based cache.
Warning note: For most Hive installations, enabling the datanucleus cache can lead to correctness issues, and is dangerous. This should be left  as "none".

datanucleus.identifierFactory
  • Default Value: datanucleus
  • Added In: Hive 0.7.0

Name of the identifier factory to use when generating table/column names etc. 'datanucleus' is used for backward compatibility.

datanucleus.plugin.pluginRegistryBundleCheck
  • Default Value: LOG
  • Added In: Hive 0.7.0

Defines what happens when plugin bundles are found and are duplicated: EXCEPTION, LOG, or NONE.

hive.metastore.warehouse.dir
  • Default Value: /user/hive/warehouse
  • Added In: Hive 0.2.0

Location of default database for the warehouse.

hive.warehouse.subdir.inherit.perms
  • Default Value: false
  • Added In: Hive 0.9.0 with HIVE-2504.

Set this to true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask. (This configuration property replaced hive.files.umask.value before Hive 0.9.0 was released.).

Behavior of the flag is changed with Hive-0.14.0 in HIVE-6892 and sub-JIRA's.  More details in Permission Inheritance in Hive.

hive.metastore.execute.setugi
  • Default Value: false in Hive 0.8.1 through 0.13.0, true starting in Hive 0.14.0
  • Added In: Hive 0.8.1 with HIVE-2616, default changed in Hive 0.14.0 with HIVE-6903

In unsecure mode, true will cause the metastore to execute DFS operations using the client's reported user and group permissions. Note that this property must be set on both the client and server sides. Further note that it's best effort. If client sets it to true and server sets it to false, the client setting will be ignored.

hive.metastore.event.listeners
  • Default Value: (empty)
  • Added In: Hive 0.8.0

List of comma-separated listeners for metastore events.

hive.metastore.partition.inherit.table.properties
  • Default Value: (empty)
  • Added In: Hive 0.8.1

List of comma-separated keys occurring in table properties which will get inherited to newly created partitions. * implies all the keys will get inherited.

hive.metastore.end.function.listeners
  • Default Value: (empty)
  • Added In: Hive 0.8.1

List of comma-separated listeners for the end of metastore functions.

hive.metastore.event.expiry.duration
  • Default Value: 0
  • Added In: Hive 0.8.0

Duration after which events expire from events table (in seconds).

hive.metastore.event.clean.freq
  • Default Value: 0
  • Added In: Hive 0.8.0

Frequency at which timer task runs to purge expired events in metastore(in seconds).

hive.metastore.connect.retries
  • Default Value: 5
  • Added In: Hive 0.6.0

Number of retries while opening a connection to metastore.

hive.metastore.client.connect.retry.delay
  • Default Value: 1
  • Added In: Hive 0.7.0

Number of seconds for the client to wait between consecutive connection attempts.

hive.metastore.client.socket.timeout
  • Default Value: 20 in Hive 0.7 through 0.13.1; 600 in Hive 0.14.0 and later
  • Added In: Hive 0.7.0; default changed in Hive 0.14.0 with HIVE-7140

MetaStore Client socket timeout in seconds.

hive.metastore.rawstore.impl
  • Default Value: org.apache.hadoop.hive.metastore.ObjectStore
  • Added In: Hive 0.8.1

Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database.

hive.metastore.batch.retrieve.max
  • Default Value: 300
  • Added In: Hive 0.8.0

Maximum number of objects (tables/partitions) can be retrieved from metastore in one batch. The higher the number, the less the number of round trips is needed to the Hive metastore server, but it may also cause higher memory requirement at the client side.

hive.metastore.ds.connection.url.hook
  • Default Value: (empty)
  • Added In: Hive 0.6.0

Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used.

hive.metastore.ds.retry.attempts
  • Default Value: 1
  • Added In: Hive 0.6.0

The number of times to retry a metastore call if there were a connection error.

hive.metastore.ds.retry.interval
  • Default Value: 1000
  • Added In: Hive 0.6.0

The number of milliseconds between metastore retry attempts.

hive.metastore.server.min.threads
  • Default Value: 200
  • Added In: Hive 0.6.0

Minimum number of worker threads in the Thrift server's pool.

hive.metastore.server.max.threads
  • Default Value: 100000
  • Added In: Hive 0.6.0

Maximum number of worker threads in the Thrift server's pool.

hive.metastore.server.max.message.size
  • Default Value: 100*1024*1024
  • Added In: Hive 1.1.0 (backported to Hive 1.0.2) with HIVE-8680

Maximum message size in bytes a Hive metastore will accept.

hive.metastore.server.tcp.keepalive
  • Default Value: true
  • Added In: Hive 0.6.0

Whether to enable TCP keepalive for the metastore server. Keepalive will prevent accumulation of half-open connections.

hive.metastore.sasl.enabled
  • Default Value: false
  • Added In: Hive 0.7.0

If true, the metastore thrift interface will be secured with SASL. Clients must authenticate with Kerberos.

hive.metastore.kerberos.keytab.file
  • Default Value: (empty)
  • Added In: Hive 0.7.0

The path to the Kerberos Keytab file containing the metastore thrift server's service principal.

hive.metastore.kerberos.principal
  • Default Value: hive-metastore/_HOST@EXAMPLE.COM
  • Added In: Hive 0.7.0

The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct host name.

hive.metastore.cache.pinobjtypes
  • Default Value: Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order
  • Added In: Hive 0.7.0

List of comma-separated metastore object types that should be pinned in the cache.

hive.metastore.authorization.storage.checks
  • Default Value: false
  • Added In: Hive 0.8.0

Should the metastore do authorization checks against the underlying storage for operations like drop-partition (disallow the drop-partition if the user in question doesn't have permissions to delete the corresponding directory on the storage).

hive.metastore.schema.verification
  • Default Value: false 
  • Added In: Hive 0.12.0 with HIVE-3764

Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt (see datanucleus.autoCreateSchema and datanucleus.schema.autoCreateAll). Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration.
False: Warn if the version information stored in metastore doesn't match with one from Hive jars.

For more information, see Metastore Schema Consistency and Upgrades.

hive.metastore.integral.jdo.pushdown
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-6052

Allow JDO query pushdown for integral partition columns in metastore. Off by default. This improves metastore performance for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (for example, if they have leading zeroes like 0012). If metastore direct SQL is enabled and works (hive.metastore.try.direct.sql), this optimization is also irrelevant.

hive.metastore.try.direct.sql
  • Default Value: true
  • Added In: Hive 0.12.0 with HIVE-4051

Whether the Hive metastore should try to use direct SQL queries instead of the DataNucleus for certain read paths. This can improve metastore performance when fetching many partitions or column statistics by orders of magnitude; however, it is not guaranteed to work on all RDBMS-es and all versions. In case of SQL failures, the metastore will fall back to the DataNucleus, so it's safe even if SQL doesn't work for all queries on your datastore. If all SQL queries fail (for example, your metastore is backed by MongoDB), you might want to disable this to save the try-and-fall-back cost.

This can be configured on a per client basis by using the "set metaconf:hive.metastore.try.direct.sql=<value>" command, starting with Hive 0.14.0 (HIVE-7532).

hive.metastore.try.direct.sql.ddl
  • Default Value: true
  • Added In: Hive 0.13.0 with HIVE-5626

Same as hive.metastore.try.direct.sql, for read statements within a transaction that modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL select query has incorrect syntax or something similar inside a transaction, the entire transaction will fail and fall-back to DataNucleus will not be possible. You should disable the usage of direct SQL inside transactions if that happens in your case.

This can be configured on a per client basis by using the "set metaconf:hive.metastore.try.direct.sql.ddl=<value>" command, starting with Hive 0.14.0 (HIVE-7532).

hive.metastore.port
  • Default Value: 9083
  • Added In: Hive 1.3.0 with HIVE-9365

Hive metastore listener port.

hive.metastore.hbase.file.metadata.threads
  • Default Value: 1
  • Added In: Hive 2.1.0 with HIVE-12075

Number of threads to use to read file metadata in background to cache it.

hive.metastore.initial.metadata.count.enabled 
  • Default Value: true
  • Added In: Hive 2.1.0 with HIVE-12628

Enable a metadata count at metastore startup for metrics.

HiveServer2

HiveServer2 was added in Hive 0.11.0 with HIVE-2935.  For more information see Setting Up HiveServer2 and HiveServer2 Clients.

Besides the configuration properties listed in this section, some HiveServer2 properties are listed in other sections:

hive.server2.thrift.port
  • Default Value: 10000
  • Added In: Hive 0.11.0 with HIVE-2935

Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT.

hive.server2.thrift.bind.host
  • Default Value: localhost
  • Added In: Hive 0.11.0 with HIVE-2935

Bind host on which to run the HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST.

hive.server2.thrift.min.worker.threads
  • Default Value: 5
  • Added In: Hive 0.11.0 with HIVE-2935

Minimum number of Thrift worker threads.

hive.server2.thrift.max.worker.threads
  • Default Value: 100 in Hive 0.11.0, 500 in Hive 0.12.0 and later
  • Added In: Hive 0.11.0 with HIVE-2935, default value changed in HIVE 0.12.0 with HIVE-4617

Maximum number of Thrift worker threads.

hive.server2.thrift.worker.keepalive.time
  • Default Value: 60
  • Added in: Hive 0.14.0 with HIVE-7353

Keepalive time (in seconds) for an idle worker thread. When number of workers > min workers, excess threads are killed after this time interval.

hive.server2.thrift.max.message.size
  • Default Value: 100*1024*1024
  • Added in: Hive 1.1.0 (backported to Hive 1.0.2) with HIVE-8680

Maximum message size in bytes a HiveServer2 server will accept.

hive.server2.authentication
  • Default Value: NONE
  • Added In: Hive 0.11.0 with HIVE-2935

Client authentication types.

NONE: no authentication check – plain SASL transport
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider (use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module (added in Hive 0.13.0 with HIVE-6466)
NOSASL:  Raw transport (added in Hive 0.13.0) 

hive.server2.authentication.kerberos.keytab
  • Default Value: (empty)
  • Added In: Hive 0.11.0 with HIVE-2935

Kerberos keytab file for server principal.

hive.server2.authentication.kerberos.principal
  • Default Value: (empty)
  • Added In: Hive 0.11.0 with HIVE-2935

Kerberos server principal.

hive.server2.custom.authentication.class
  • Default Value: (empty)
  • Added In: Hive 0.11.0 with HIVE-2935

Custom authentication class. Used when property hive.server2.authentication is set to 'CUSTOM'. Provided class must be a proper implementation of the interface org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2 will call its Authenticate(user, passed) method to authenticate requests. The implementation may optionally extend Hadoop's org.apache.hadoop.conf.Configured class to grab Hive's Configuration object.

hive.server2.enable.doAs

Setting this property to true will have HiveServer2 execute Hive operations as the user making the calls to it.

hive.server2.authentication.ldap.url
  • Default Value: (empty)
  • Added In: Hive 0.11.0 with HIVE-2935

LDAP connection URL(s), value could be a SPACE separated list of URLs to multiple LDAP servers for resiliency. URLs are tried in the order specified until the connection is successful.

hive.server2.authentication.ldap.baseDN
  • Default Value: (empty)
  • Added In: Hive 0.11.0 with HIVE-2935

LDAP base DN (distinguished name).

hive.server2.authentication.ldap.guidKey
  • Default Value: uid
  • Added In: Hive 2.1.0 with HIVE-13295

This property is to indicate what prefix to use when building the bindDN for LDAP connection (when using just baseDN). So bindDN will be "<guidKey>=<user/group>,<baseDN>". If userDNPattern and/or groupDNPattern is used in the configuration, the guidKey is not needed. Primarily required when just baseDN is being used.

hive.server2.authentication.ldap.Domain
  • Default Value: (empty)
  • Added In: Hive 0.12.0 with HIVE-4707

LDAP domain.

hive.server2.authentication.ldap.groupDNPattern
  • Default Value: (empty)
  • Added In: Hive 1.3 with HIVE-7193

A COLON-separated list of string patterns to represent the base DNs for LDAP Groups. Use "%s" where the actual group name is to be plugged in. See Group Membership for details.

Example of one string pattern: uid=%s,OU=Groups,DC=apache,DC=org

hive.server2.authentication.ldap.groupFilter
  • Default Value: (empty)
  • Added In: Hive 1.3 with HIVE-7193

A COMMA-separated list of group names that the users should belong to (at least one of the groups) for authentication to succeed. See Group Membership for details.

hive.server2.authentication.ldap.groupMembershipKey
  • Default Value: member
  • Added In: Hive 2.1.0 with HIVE-13295

This property is used in LDAP search queries when finding LDAP group names a particular user belongs to. The value of the LDAP attribute, indicated by this property, should be a full DN for the user or the short username or userid. For example: A group entry for say "fooGroup" containing "member : uid=fooUser,ou=Users,dc=domain,dc=com" will help determine that  "fooUser" belongs to LDAP group "fooGroup".

See Group Membership for a detailed example.

hive.server2.authentication.ldap.groupClassKey
  • Default Value: groupOfNames
  • Added In: Hive 1.3 with HIVE-13295

This property is used in LDAP search queries for finding LDAP group names a user belongs to. The value of this property is used to construct LDAP group search query and is used to indicate what a group's objectClass is. Every LDAP group has certain objectClass. For example: group, groupOfNames, groupOfUniqueNames etc.

See Group Membership for a detailed example.

hive.server2.authentication.ldap.userDNPattern
  • Default Value: (empty)
  • Added In: Hive 1.3 with HIVE-7193

A COLON-separated list of string patterns to represent the base DNs for LDAP Users. Use "%s" where the actual username is to be plugged in. See User Search List for details.

Example of one string pattern: uid=%s,OU=Users,DC=apache,DC=org

hive.server2.authentication.ldap.userFilter
  • Default Value: (empty)
  • Added In: Hive 1.3 with HIVE-7193

A COMMA-separated list of usernames for whom authentication will succeed if the user is found in LDAP. See User Search List for details.

hive.server2.authentication.ldap.customLDAPQuery
  • Default Value: (empty)
  • Added In: Hive 1.3 with HIVE-7193

A user-specified custom LDAP query that will be used to grant/deny an authentication request. If the user is part of the query's result set, authentication succeeds. See Custom Query String for details.

hive.server2.global.init.file.location

Either the location of a HiveServer2 global init file or a directory containing a .hiverc file. If the property is set, the value must be a valid path to an init file or directory where the init file is located.

hive.server2.transport.mode

Server transport mode. Value can be "binary" or "http".

hive.server2.thrift.http.port
  • Default Value: 10001
  • Added In: Hive 0.12.0

Port number when in HTTP mode.

hive.server2.thrift.http.path

Path component of URL endpoint when in HTTP mode.

hive.server2.thrift.http.min.worker.threads
  • Default Value: 5
  • Added In: Hive 0.12.0

Minimum number of worker threads when in HTTP mode.

hive.server2.thrift.http.max.worker.threads
  • Default Value: 500
  • Added In: Hive 0.12.0

Maximum number of worker threads when in HTTP mode.

hive.server2.thrift.http.max.idle.time
  • Default Value: 1800s (ie, 1800 seconds)

  • Added In: Hive 0.14.0 in HIVE-7169

Maximum idle time for a connection on the server when in HTTP mode.

hive.server2.thrift.http.worker.keepalive.time
  • Default Value: 60
  • Added In: Hive 0.14.0 in HIVE-7353

Keepalive time (in seconds) for an idle http worker thread. When number of workers > min workers, excess threads are killed after this time interval.

hive.server2.thrift.sasl.qop
  • Default Value: auth
  • Added In: Hive 0.12.0

Sasl QOP value; set it to one of the following values to enable higher levels of protection for HiveServer2 communication with clients.

"auth" – authentication only (default)
"auth-int" – authentication plus integrity protection
"auth-conf" – authentication plus integrity and confidentiality protection

Note that hadoop.rpc.protection being set to a higher level than HiveServer2 does not make sense in most situations. HiveServer2 ignores hadoop.rpc.protection in favor of hive.server2.thrift.sasl.qop.

This is applicable only if HiveServer2 is configured to use Kerberos authentication.

hive.server2.async.exec.threads
  • Default Value: 50 in Hive 0.12.0, 100 in Hive 0.13.0 and later
  • Added In: Hive 0.12.0 with HIVE-4617, default value changed in Hive 0.13.0 with HIVE-5229

Number of threads in the async thread pool for HiveServer2.

hive.server2.async.exec.shutdown.timeout
  • Default Value: 10
  • Added In: Hive 0.12.0 with HIVE-4617

Time (in seconds) for which HiveServer2 shutdown will wait for async threads to terminate.

hive.server2.table.type.mapping

This setting reflects how HiveServer2 will report the table types for JDBC and other client implementations that retrieve the available tables and supported table types.

HIVE: Exposes Hive's native table types like MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW
CLASSIC: More generic types like TABLE and VIEW

hive.server2.session.hook
  • Default Value: (empty)
  • Added In: Hive 0.12.0 with HIVE-4588

Session-level hook for HiveServer2.

hive.server2.max.start.attempts
  • Default Value: 30
  • Added In: Hive 0.13.0 with HIVE-5794

The number of times HiveServer2 will attempt to start before exiting, sleeping 60 seconds between retries. The default of 30 will keep trying for 30 minutes.

hive.server2.async.exec.wait.queue.size
  • Default Value: 100
  • Added In: Hive 0.13.0 with HIVE-5229

Size of the wait queue for async thread pool in HiveServer2. After hitting this limit, the async thread pool will reject new requests.

hive.server2.async.exec.keepalive.time
  • Default Value: 10
  • Added In: Hive 0.13.0 with HIVE-5229

Time (in seconds) that an idle HiveServer2 async thread (from the thread pool) will wait for a new task to arrive before terminating.

hive.server2.long.polling.timeout
  • Default Value: 5000L
  • Added In: Hive 0.13.0 with HIVE-5217

Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling.

hive.server2.allow.user.substitution
  • Default Value: true
  • Added In: Hive 0.13.0

Allow alternate user to be specified as part of HiveServer2 open connection request.

hive.server2.authentication.spnego.keytab
  • Default Value: (empty)
  • Added In: Hive 0.13.0

Keytab file for SPNEGO principal, optional. A typical value would look like /etc/security/keytabs/spnego.service.keytab. This keytab would be used by HiveServer2 when Kerberos security is enabled and HTTP transport mode is used. This needs to be set only if SPNEGO is to be used in authentication.

SPNEGO authentication would be honored only if valid hive.server2.authentication.spnego.principal and hive.server2.authentication.spnego.keytab are specified.

hive.server2.authentication.spnego.principal
  • Default Value: (empty)
  • Added In: Hive 0.13.0

SPNEGO service principal, optional. A typical value would look like HTTP/_HOST@EXAMPLE.COM. The SPNEGO service principal would be used by HiveServer2 when Kerberos security is enabled and HTTP transport mode is used. This needs to be set only if SPNEGO is to be used in authentication.

hive.server2.authentication.pam.services
  • Default Value: (empty)
  • Added In: Hive 0.13.0 with HIVE-6466

List of the underlying PAM services that should be used when hive.server2.authentication type is PAM. A file with the same name must exist in /etc/pam.d.

hive.server2.use.SSL
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-5351

Set this to true for using SSL encryption in HiveServer2.

hive.server2.keystore.path
  • Default Value: (empty)
  • Added In: Hive 0.13.0 with HIVE-5351

SSL certificate keystore location.

hive.server2.keystore.password
  • Default Value: (empty)
  • Added In: Hive 0.13.0 with HIVE-5351

SSL certificate keystore password.

hive.server2.tez.default.queues
  • Default Value: (empty)
  • Added In: Hive 0.13.0 with HIVE-6325

A list of comma separated values corresponding to YARN queues of the same name. When HiveServer2 is launched in Tez mode, this configuration needs to be set for multiple Tez sessions to run in parallel on the cluster.

hive.server2.tez.sessions.per.default.queue
  • Default Value: 1
  • Added In: Hive 0.13.0 with HIVE-6325

A positive integer that determines the number of Tez sessions that should be launched on each of the queues specified by hive.server2.tez.default.queues. Determines the parallelism on each queue.

hive.server2.tez.initialize.default.sessions
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-6325

This flag is used in HiveServer 2 to enable a user to use HiveServer 2 without turning on Tez for HiveServer 2. The user could potentially want to run queries over Tez without the pool of sessions.

hive.server2.session.check.interval
  • Default Value:
    • Hive 0.x, 1.0.x, 1.1.x, 1.2.0: 0ms
    • Hive 1.2.1+, 1.3+, 2.x+: 6h (HIVE-9842
  • Added In: Hive 0.14.0 with HIVE-5799

The check interval for session/operation timeout, which can be disabled by setting to zero or negative value.

hive.server2.idle.session.timeout
  • Default Value:
    • Hive 0.x, 1.0.x, 1.1.x, 1.2.0: 0ms
    • Hive 1.2.1+, 1.3+, 2.x+: 7d (HIVE-9842
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, session will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero or negative value.

hive.server2.idle.operation.timeout
  • Default Value: 0ms
  • Added In: Hive 0.14.0 with HIVE-5799

With hive.server2.session.check.interval set to a positive time value, operation will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero value.

With positive value, it's checked for operations in terminal state only (FINISHED, CANCELED, CLOSED, ERROR).
With negative value, it's checked for all of the operations regardless of state.

hive.server2.logging.operation.enabled

When true, HiveServer2 will save operation logs and make them available for clients.

hive.server2.logging.operation.log.location

Top level directory where operation logs are stored if logging functionality is enabled.

hive.server2.logging.operation.verbose
  • Default Value: false
  • Added In: Hive 0.14.0 with HIVE-8785
  • Removed In: Hive 1.2.0 with HIVE-10119

When true, HiveServer2 operation logs available for clients will be verbose. Replaced in Hive 1.2.0 by hive.server2.logging.operation.level.

hive.server2.logging.operation.level
  • Default Value: EXECUTION
  • Added In: Hive 1.2.0 with HIVE-10119

HiveServer2 operation logging mode available to clients to be set at session level.

For this to work, hive.server2.logging.operation.enabled should be set to true. The allowed values are:

  • NONE: Ignore any logging.
  • EXECUTION: Log completion of tasks.
  • PERFORMANCE: Execution + Performance logs.
  • VERBOSE: All logs.
hive.server2.thrift.http.cookie.auth.enabled
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-9710

When true, HiveServer2 in HTTP transport mode will use cookie based authentication mechanism.

hive.server2.thrift.http.cookie.max.age
  • Default Value: 86400s (1 day)
  • Added In: Hive 1.2.0 with HIVE-9710

Maximum age in seconds for server side cookie used by HiveServer2 in HTTP mode.

hive.server2.thrift.http.cookie.path
  • Default Value: (empty)
  • Added In: Hive 1.2.0 with HIVE-9710

Path for the HiveServer2 generated cookies.

hive.server2.thrift.http.cookie.domain
  • Default Value: (empty)
  • Added In: Hive 1.2.0 with HIVE-9710

Domain for the HiveServer2 generated cookies.

hive.server2.thrift.http.cookie.is.secure
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-9710

Secure attribute of the HiveServer2 generated cookie.

hive.server2.thrift.http.cookie.is.httponly
  • Default Value: true
  • Added In: Hive 1.2.0 with HIVE-9710

HttpOnly attribute of the HiveServer2 generated cookie.

hive.hadoop.classpath
  • Default Value: (empty)
  • Added In: Hive 0.14.0 with HIVE-8340

For the Windows operating system, Hive needs to pass the HIVE_HADOOP_CLASSPATH Java parameter while starting HiveServer2 using "-hiveconf hive.hadoop.classpath=%HIVE_LIB%". Users can set this parameter in hiveserver2.xml.

HiveServer2 Web UI

A web interface for HiveServer2 is introduced in release 2.0.0 (see Web UI for HiveServer2).

hive.server2.webui.host
  • Default Value: 0.0.0.0
  • Added In: Hive 2.0.0 with HIVE-12338

The host address the HiveServer2 Web UI will listen on. The Web UI can be used to access the HiveServer2 configuration, local logs, and metrics. It can also be used to check some information about active sessions and queries being executed.

hive.server2.webui.port

The port the HiveServer2 Web UI will listen on. Set to 0 or a negative number to disable the HiveServer2 Web UI feature.

hive.server2.webui.max.threads
  • Default Value: 50
  • Added In: Hive 2.0.0 with HIVE-12338

The maximum number of HiveServer2 Web UI threads.

hive.server2.webui.max.historic.queries
  • Default Value: 25
  • Added In: Hive 2.1.0 with HIVE-12550

The maximum number of past queries to show in HiveServer2 Web UI.

hive.server2.webui.use.ssl

Set this to true for using SSL encryption for HiveServer2 WebUI.

hive.server2.webui.keystore.path
  • Default Value: (empty)
  • Added In: Hive 2.0.0 with HIVE-12471

SSL certificate keystore location for HiveServer2 WebUI.

hive.server2.webui.keystore.password
  • Default Value: (empty)
  • Added In: Hive 2.0.0 with HIVE-12471

SSL certificate keystore password for HiveServer2 WebUI.

hive.server2.webui.use.spenego
  • Default Value: false
  • Added In: Hive 2.0.0 with HIVE-12485 

SSL certificate keystore password for HiveServer2 WebUI.

hive.server2.webui.spnego.keytab
  • Default Value: (empty)
  • Added In: Hive 2.0.0 with HIVE-12485

The path to the Kerberos Keytab file containing the HiveServer2 WebUI SPNEGO service principal.

hive.server2.webui.spnego.principal
  • Default Value: HTTP/_HOST@EXAMPLE.COM
  • Added In: Hive 2.0.0 with HIVE-12485

The HiveServer2 WebUI SPNEGO service principal. The special string _HOST will be replaced automatically with the value of hive.server2.webui.host or the correct host name.

Spark

Apache Spark was added in Hive 1.1.0 (HIVE-7292 and the merge-to-trunk JIRA's HIVE-9257, 9352, 9448). For information see the design document Hive on Spark and Hive on Spark: Getting Started.

To configure Hive execution to Spark, set the following property to "spark":

hive.spark.job.monitor.timeout

Timeout for job monitor to get Spark job state.

hive.spark.dynamic.partition.pruning

When true, this turns on dynamic partition pruning for the Spark engine, so that joins on partition keys will be processed by writing to a temporary HDFS file, and read later for removing unnecessary partitions.

hive.spark.dynamic.partition.pruning.max.data.size

The maximum data size for the dimension table that generates partition pruning information. If reaches this limit, the optimization will be turned off.

Remote Spark Driver

The remote Spark driver is the application launched in the Spark cluster, that submits the actual Spark job. It was introduced in HIVE-8528. It is a long-lived application initialized upon the first query of the current user, running until the user's session is closed. The following properties control the remote communication between the remote Spark driver and the Hive client that spawns it.

hive.spark.client.future.timeout

Timeout for requests from Hive client to remote Spark driver.

hive.spark.client.connect.timeout

Timeout for remote Spark driver in connecting back to Hive client.

hive.spark.client.server.connect.timeout
  • Default Value: 90000 miliseconds
  • Added In: Hive 1.1.0 with HIVE-9337, default changed in same release with HIVE-9519

Timeout for handshake between Hive client and remote Spark driver. Checked by both processes.

hive.spark.client.secret.bits

Number of bits of randomness in the generated secret for communication between Hive client and remote Spark driver. Rounded down to nearest multiple of 8.

hive.spark.client.rpc.threads

Maximum number of threads for remote Spark driver's RPC event loop.

hive.spark.client.rpc.max.size
  • Default Value: 52,428,800 (50 * 1024 * 1024, or 50 MB)
  • Added In: Hive 1.1.0 with HIVE-9337

Maximum message size in bytes for communication between Hive client and remote Spark driver. Default is 50 MB.

hive.spark.client.channel.log.level

Channel logging level for remote Spark driver. One of DEBUG, ERROR, INFO, TRACE, WARN. If unset, TRACE is chosen.

Tez

Apache Tez was added in Hive 0.13.0 (HIVE-4660 and HIVE-6098).  For information see the design document Hive on Tez, especially the Installation and Configuration section.

Besides the configuration properties listed in this section, some properties in other sections are also related to Tez:

hive.jar.directory

This is the location that Hive in Tez mode will look for to find a site-wide installed Hive instance.  See hive.user.install.directory for the default behavior.

hive.user.install.directory

If Hive (in Tez mode only) cannot find a usable Hive jar in hive.jar.directory, it will upload the Hive jar to <hive.user.install.directory>/<user_name> and use it to run queries.

hive.compute.splits.in.am

Whether to generate the splits locally or in the ApplicationMaster (Tez only).

hive.rpc.query.plan

Whether to send the query plan via local resource or RPC.

hive.prewarm.enabled

Enables container prewarm for Tez (Hadoop 2 only).

hive.prewarm.numcontainers

Controls the number of containers to prewarm for Tez (Hadoop 2 only).

hive.merge.tezfiles

Merge small files at the end of a Tez DAG.

hive.tez.input.format

The default input format for Tez. Tez groups splits in the AM (ApplicationMaster).

hive.tez.container.size

By default Tez will spawn containers of the size of a mapper. This can be used to overwrite the default.

hive.tez.java.opts

By default Tez will use the Java options from map tasks. This can be used to overwrite the default.

hive.convert.join.bucket.mapjoin.tez
  • Default Value: false
  • Added In: Hive 0.13.0 with HIVE-6447

Whether joins can be automatically converted to bucket map joins in Hive when Tez is used as the execution engine (hive.execution.engine is set to "tez").

hive.tez.log.level
  • Default Value: INFO
  • Added In: Hive 0.13.0 with HIVE-6743

The log level to use for tasks executing as part of the DAG. Used only if hive.tez.java.opts is used to configure Java options.

hive.localize.resource.wait.interval
  • Default Value: 5000
  • Added In: Hive 0.13.0 with HIVE-6782

Time in milliseconds to wait for another thread to localize the same resource for Hive-Tez.

hive.localize.resource.num.wait.attempts
  • Default Value: 5
  • Added In: Hive 0.13.0 with HIVE-6782

The number of attempts waiting for localizing a resource in Hive-Tez.

hive.tez.smb.number.waves
  • Default Value: 0.5
  • Added In: Hive 0.14.0 with HIVE-8409

The number of waves in which to run the SMB (sort-merge-bucket) join. Account for cluster being occupied. Ideally should be 1 wave.

hive.tez.cpu.vcores
  • Default Value: -1
  • Added In: Hive 0.14.0 with HIVE-8452

By default Tez will ask for however many CPUs MapReduce is configured to use per container. This can be used to overwrite the default.

hive.tez.auto.reducer.parallelism
  • Default Value: false
  • Added In: Hive 0.14.0 with HIVE-7158

Turn on Tez' auto reducer parallelism feature. When enabled, Hive will still estimate data sizes and set parallelism estimates. Tez will sample source vertices' output sizes and adjust the estimates at runtime as necessary.

hive.tez.max.partition.factor
  • Default Value: 2
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to over-partition data in shuffle edges.

hive.tez.min.partition.factor
  • Default Value: 0.25
  • Added In: Hive 0.14.0 with HIVE-7158

When auto reducer parallelism is enabled this factor will be used to put a lower limit to the number of reducers that Tez specifies.

LLAP

Long Live and Process (LLAP) functionality was added in Hive 2.0 (HIVE-7926 and associated tasks).