Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: rename File Formats and I/O to include SerDes, move some script parameters around

...

Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity.

hive.script.auto.progress
  • Default Value: false
  • Added In: Hive 0.4.0

Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.

hive.exec.script.allow.partial.consumption

...

By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid transaction list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable.

Also see:
  • SerDes for more hive.script.* configuration properties
hive.exec.compress.output

...

For conditional joins, if input stream from a small alias can be directly applied to the join operator without filtering or projection, the alias need not be pre-staged in the distributed cache via a mapred local task. Currently, this is not working with vectorization or Tez execution engine.

hive.

...

udtf.auto.progress
  • Default Value: false
  • Added In: Hive 0.45.0

Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker when using UDTF's to avoid prevent the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users Users should be cautious because this may prevent TaskTracker from killing tasks with infinite loops in the scripts to be killed by TaskTracker.

hive.mapred.

...

reduce.tasks.speculative.execution
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe true
  • Added In: Hive 0.45.0

The default SerDe for transmitting input data to and reading output data from the user scriptsWhether speculative execution for reducers should be turned on.

hive.exec.counters.

...

pull.

...

interval
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordReader 1000
  • Added In: Hive 0.46.0

The default record reader for reading data from the user scriptsinterval with which to poll the JobTracker for the counters the running job. The smaller it is the more load there will be on the jobtracker, the higher it is the less granular the caught will be.

hive.

...

enforce.

...

bucketing
  • Default Value: org.apache.hadoop.hive.ql.exec.TextRecordWriter false
  • Added In: Hive 0.56.0

The default record writer for writing data to the user scripts.

...

Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced.

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.

hive.enforce.sorting
  • Default Value: false
  • Added In: Hive 0.56.0

Whether Hive should automatically send progress information to TaskTracker when using UDTF's to prevent the task getting killed because of inactivity. Users should be cautious because this may prevent TaskTracker from killing tasks with infinite loops.

...

sorting is enforced. If true, while inserting into the table, sorting is enforced.

hive.optimize.reducededuplication
  • Default Value: true
  • Added In: Hive 0.56.0

Whether speculative execution for reducers should be turned onRemove extra map-reduce jobs if the data is already clustered by the same key which needs to be used again. This should always be set to true. Since it is a new feature, it has been made configurable.

hive.

...

optimize.

...

reducededuplication.

...

min.

...

reducer
  • Default Value:

...

  •  4
  • Added In: Hive 0.

...

The interval with which to poll the JobTracker for the counters the running job. The smaller it is the more load there will be on the jobtracker, the higher it is the less granular the caught will be.

hive.enforce.bucketing
  • Default Value: false
  • Added In: Hive 0.6.0

Whether bucketing is enforced. If true, while inserting into the table, bucketing is enforced.

Set to true to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see hive.txn.manager.

hive.enforce.sorting
  • Default Value: false
  • Added In: Hive 0.6.0

Whether sorting is enforced. If true, while inserting into the table, sorting is enforced.

hive.optimize.reducededuplication
  • Default Value: true
  • Added In: Hive 0.6.0

Remove extra map-reduce jobs if the data is already clustered by the same key which needs to be used again. This should always be set to true. Since it is a new feature, it has been made configurable.

hive.optimize.reducededuplication.min.reducer
  • Default Value: 4
  • Added In: Hive 0.11.0 with HIVE-2340

Reduce deduplication merges two RSs (reduce sink operators) by moving key/parts/reducer-num of the child RS to parent RS. That means if reducer-num of the child RS is fixed (order by or forced bucketing) and small, it can make very slow, single MR. The optimization will be disabled if number of reducers is less than specified value.

...

Setting to 0.12 (default) maintains division behavior in Hive 0.12 and earlier releases: int / int = double.
Setting to 0.13 gives division behavior in Hive 0.13 and later releases: int / int = decimal.

An invalid setting will cause an error message, and the default support level will be used.

hive.optimize.constant.propagation
  • Default Value: true
  • Added In: Hive 0.14.0 with HIVE-5771

Whether to enable the constant propagation optimizer.

hive.entity.capture.transform

Enable capturing compiler read entity of transform URI which can be introspected in the semantic and exec hooks.

hive.explain.user
  • Default Value: false
  • Added In: Hive 1.2.

...

Whether to show explain result at user levelWhen enabled, will log EXPLAIN output for the query at user level.

SerDes, I/O, and File Formats

SerDes

hive.script.serde

An invalid setting will cause an error message, and the default support level will be used.

...

  • Default Value: trueorg.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  • Added In: Hive 0.14.0 with HIVE-57714.0

The default SerDe for transmitting input data to and reading output data from the user scriptsWhether to enable the constant propagation optimizer.

hive.

...

script.

...

recordreader
  • Default Value: falseorg.apache.hadoop.hive.ql.exec.TextRecordReader
  • Added In: Hive 10.14.0 with HIVE-8938

The default record reader for reading data from the user scriptsEnable capturing compiler read entity of transform URI which can be introspected in the semantic and exec hooks.

hive.

...

script.

...

recordwriter
  • Default Value: false org.apache.hadoop.hive.ql.exec.TextRecordWriter
  • Added In: Hive 1Hive 0.2.0 with HIVE-9780

Whether to show explain result at user levelWhen enabled, will log EXPLAIN output for the query at user level.

File Formats and I/O

  • 5.0

The default record writer for writing data to the user scripts.

hive.default.serde
  • Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  • Added in: Hive 0.14 with HIVE-5976

...

LazySimpleSerDe uses this property to determine if it treats 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'. The default is false, which means only 'TRUE' and 'FALSE' The default is false, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals.are treated as legal boolean literals.

I/O

 

hive.io.exception.handlers

 

  • Default Value: (empty)
  • Added In: Hive 0.8.1

A list of I/O exception handler class names. This is used to construct a list of exception handlers to handle exceptions thrown by record readers.

hive.input.format
  • Default Value: org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
  • Added In: Hive 0.5.0

The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat.

Also see:

 

General File Formats

hive.default.fileformat
  • Default Value: TextFile
  • Added In: Hive 0.2.0

...

Users can explicitly say CREATE TABLE ... STORED AS TEXTFILE|SEQUENCEFILE|RCFILE|ORC|AVRO|INPUTFORMAT...OUTPUTFORMAT... to override. (RCFILE was added in Hive 0.6.0, ORC in 0.11.0, and AVRO in 0.14.0.) See Row Format, Storage Format, and SerDe for details.

hive.fileformat.check

  • Default Value: true
  • Added In: Hive 0.5.0

...

File format to use for a query's intermediate results. Options are TextFile, SequenceFile, and RCfile. Set to SequenceFile if any columns are string type and contain new-line characters (HIVE-1608HIVE-3065).

hive.io.exception.handlers

...

)

...

.

...

A list of I/O exception handler class names. This is used to construct a list of exception handlers to handle exceptions thrown by record readers.

hive.input.format
  • Default Value: org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
  • Added In: Hive 0.5.0

The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat.

Also see:

...

RCFile Format
hive.io.rcfile.record.interval

...