Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleVersion information

As of Hive 0.14.0 ( HIVE-7211 ), a configuration name that starts with "hive." is regarded as a Hive system property. With the hive.conf.validation option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception.

Query and DDL Execution

hive.execution.engine

...

  • Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
  • Added In: Hive 0.13.0 with HIVE-6455
  • Deprecated: replaced with hive.optimize.sort.dynamic.partition.threshold
  • Removed in Hive 4.0 with HIVE-25320

When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

...

If set to true, order/sort by without limit in subqueries and views will be removed.

SerDes and I/O

SerDes

Datetime

hive.

...

datetime.

...

formatter
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeDATETIME
  • Added In: Hive 4.0.4.0 with
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    key

The default SerDe for transmitting input data to and reading output data from the user scripts.

hive.script.recordreader
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordReader
  • Added In: Hive 0.4.0

The default record reader for reading data from the user scripts.

hive.script.recordwriter
  • HIVE-25576
    ,
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyHIVE-27673

The formatter to use for handling datetime values. The possible values are:

  • DATETIME: For using java.time.format.DateTimeFormatter
  • SIMPLE: For using java.text.SimpleDateFormat (known bugs: HIVE-25458, HIVE-25403, HIVE-25268)
hive.datetime.formatter.resolver.style
  • Default Value: SMART
  • Added in: Hive 4.0.0 with HIVE-27772 

The style used by the hive.datetime.formatter (only applicable to DATETIME) to resolve dates amd times. The possible values are:

  • SMART:
    • Using smart resolution will perform the sensible default for each field, which may be the same as strict, the same as lenient, or a third behavior. Individual fields will interpret this differently.
    • For example, resolving year-month and day-of-month in the ISO calendar system using smart mode will ensure that the day-of-month is from 1 to 31, converting any value beyond the last valid day-of-month to be the last valid day-of-month.
  • STRICT:
    • Using strict resolution will ensure that all parsed values are within the outer range of valid values for the field. Individual fields may be further processed for strictness.
    • For example, resolving year-month and day-of-month in the ISO calendar system using strict mode will ensure that the day-of-month is valid for the year-month, rejecting invalid values.
    • When using Strict as the hive.datetime.formatter.resolver.style we should use the pattern "u" to represent year. For more details, please refer: https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html  
  • LENIENT:
    • Lenient mode allows the month in the ISO calendar system to be outside the range 1 to 12. For example, month 15 is treated as being 3 months after month 12.

Currently these configuration only affects the behavior of the following SQL functions:

  • unix_timestamp(string,[string])
  • from_unixtime
  • date_format

The SIMPLE formatter exists purely for compatibility purposes with previous versions of Hive thus its use is discouraged. It suffers from known bugs that are unlikely to be fixed in subsequent versions of the product. Furthermore, using SIMPLE formatter may lead to strange behavior, and unexpected results when combined with SQL functions/operators that are using the new DATETIME formatter.

SerDes and I/O

SerDes

hive.script.serde
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  • Added In: Hive 0.4.0

The default SerDe for transmitting input data to and reading output data from the user scripts.

hive.script.recordreader
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordReader
  • Added In: Hive 0.4.0

The default record reader for reading data from the user scripts.

hive.script.recordwriter
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordWriter
  • Added In: Hive 0.5.0

...

Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by hive.default.fileformat. Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the StorageHandlers section for more information on managed/external and native/non-native terminology).

...


NativeNon-Native
Managedhive.default.fileformat.managed (or fall back to hive.default.fileformat)Not covered by default file-formats
Externalhive.default.fileformatNot covered by default file-formats

...

hive.parquet.timestamp.skip.conversion
  • Default Valuetrue
  • Added In: Hive 1.2.0 with HIVE-9482
  • true
  • Added In: Hive 1.2.0 with HIVE-9482

Pre-3.1.2 Hive implementation of Parquet stores timestamps in UTC on-file, this flag allows skipping of the conversion on reading Parquet files created from other tools that may not have done so.

Avro

See AvroSerDe for details.

hive.avro.timestamp.skip.conversion
  • Default Value: false
  • Added In: Hive 3.1.2 with HIVE-21291

Some older Hive implementations (pre-3.1.2) wrote Avro timestamps in a UTC-normalized manner, while from version 3.1.0 until 3.1.2 Hive wrote time zone agnostic timestamps.
Setting this flag to true will treat legacy timestamps as time zone agnostic. Setting it to false will treat legacy timestamps as UTC-normalized.
This flag does not affect timestamps written starting with Hive 3.1.2, which are effectively time zone agnostic (see HIVE-21002 for details).
NOTE: This property will influence how HBase files using the AvroSerDe and timestamps in Kafka tables (in the payload/Avro file, this is not about Kafka timestamps) are deserialized – keep in mind that timestamps serialized using the AvroSerDe will be UTC-normalized during serialization. So keep this property false if using HBase or KafkaCurrent Hive implementation of Parquet stores timestamps in UTC on-file, this flag allows skipping of the conversion on reading Parquet files created from other tools that may not have done so.

Vectorization

Hive added vectorized query execution in release 0.13.0 (HIVE-4160, HIVE-5283). For more information see the design document Vectorized Query Execution .

...

Used to avoid all of the proxies and object copies in the metastore. Note, if this is set, you MUST use a local metastore (hive.metastore.uris must be empty) otherwise undefined and most likely undesired behavior will result.

hive.metastore.jdbc.max.batch.size
  • Default Value: 1000
  • Added In: Hive 4.0.0 with HIVE-23093

This controls the maximum number of update/delete/insert queries in a single JDBC batch statement.

Hive Metastore Connection Pooling Configuration

The Hive Metastore supports several connection pooling implementations (e.g. hikaricp, bonecp, dbcp). Configuration connection pooling implementations (e.g. hikaricp, bonecp, dbcp). Configuration properties prefixed by 'dbcp' in versions prior to Hive 4.0.0-alpha-1 will be propagated as is to the connection pool implementation by Hive. Starting in release 4.0.0-alpha-1, when using hikaricp, properties prefixed by 'hikarihikaricp' or 'dbcp' will be propagated as is to the connectionpool implementation by Hiveunderlying connection pool. Jdbc connection url, username, password and connection pool maximum connections are exceptions which must be configured with their special Hive Metastore configuration properties.

...

These parameters must also have non-default values to turn on Hive transactions:

...

Turning on Hive transactions also requires appropriate settings for hive.compactor.initiator.on , hive.compactor.cleaner.on, hive.compactor.worker.threads , hive.support.concurrency , hive.enforce.bucketing  (Hive 0.x and 1.x only), and hive.exec.dynamic.partition.mode .

...

For example: Can't serialize.*,40001$,^Deadlock,.*ORA-08176.*

The string that the regex will be matched against is of the following form, .*ORA-08176.*

The string that the regex will be matched against is of the following form, where ex is a SQLException:

ex.getMessage() + " (SQLState=" + ex.getSQLState() + ", ErrorCode=" + ex.getErrorCode() + ")"

Compactor

...

where ex is a SQLException:

ex.getMessage() + " (SQLState=" + ex.getSQLState() + ", ErrorCode=" + ex.getErrorCode() + ")"

Compactor

hive.compactor.initiator.on
  • Default Value: false
  • Hive Transactions Value:  true  (for exactly one instance of the Thrift metastore service)
  • Added In: Hive 0.13.0 with HIVE-5843

Whether to run the initiator thread on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager .

Before Hive 1.3.0 it's critical that this is enabled on exactly one metastore service instance. As of  Hive 1.3.0  this property may be enabled on any number of standalone metastore instances.

hive.compactor.cleaner.on
  • Default Value: false
  • Hive Transactions Value:  true  (for exactly one instance of the Thrift metastore service)
  • Added In: Hive 4.0.13.0 with 0 with HIVE-584326908

Whether to run the initiator and cleaner threads Cleaner thread on this metastore instance. Set this to true on one on one instance of the Thrift metastore service as service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see hive.txn.manager .It's critical that this is enabled on exactly one metastore service instance (not enforced yet).

Before Hive 4.0.0 Cleaner thread can be started/stopped with config hive.compactor.initiator.on. This config helps to enable/disable initiator/cleaner threads independently

hive.compactor.worker.threads

...

Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)

hive.compactor.abortedtxn.threshold
  • Default Value: 1000
  • Added In: Hive 0.13.0 with HIVE-5843

Number of aborted transactions involving a given table or partition that will trigger a major compaction.

hive.compactor.aborted.txn.

...

time.threshold
  • Default Value: 100012h
  • Added In: Hive 4.0.13.0 with 0 with HIVE-584323280

...

Age of table/partition's oldest aborted transaction when compaction will be triggered.
Default time unit is: hours. Set to a negative number to disable.

Compaction History

hive.compactor.history.retention.succeeded 

...

Number of failed compaction entries to retain in history (per partition).

...

metastore.compactor.history.retention

...

.did.not.initiate
  
  • Default Value: 2
  • Added In: Hive 1.3.0 and 2.0.0 with HIVE-12353

...

  • Deprecated name: hive.compactor.history.retention.attempted 

Determines how many compaction records in state 'did not initiate' will be retained in compaction history for a given table/partition.

hive.compactor.history.reaper.interval

...

Standard error allowed for NDV estimates, expressed in percentage. This provides a tradeoff between accuracy and compute cost. A lower value for the error indicates higher accuracy and a higher compute cost. (NDV means number of distinct values.) 

It only affects the FM-Sketch (not the HLL algorithm which is the default), where it computes the number of necessary bitvectors to achieve the accuracy.

hive.stats.collect.tablekeys

...

Enable metrics on the Hive Metastore Service. (For other metastore configuration properties, see the Metastore and Hive Metastore Security sections.)

hive.metastore.acidmetrics.thread.on
  • Default Value: true
  • Added in: Hive 4.0.0 with HIVE-24824

Whether to run acid related metrics collection on this metastore instance.

hive.server2.metrics.enabled
  • Default Value: false
  • Added in: Hive 1.3.0 and 2.0.0 with HIVE-10761

...

For WebHCat configuration, see Configuration Variables in the WebHCat manual.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 




















Save

Save

Save

Save

Save

Save

Save

...