Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PropertyDefaultDescription
hcatalog.hive.client.cache.expiry.time120Allows users to override the expiry time specified -- this is an int, and specifies number of seconds.
hcatalog.hive.client.cache.disabledfalseAllows people to disable the cache altogether if they wish to. This is useful in highly multithreaded usecases. 

Input Split Generation Behaviour

...

PropertyDefaultDescription
hcat.desired.partition.num.splitsnot setThis is a hint/guidance that can be provided to HCatalog to pass on to underlying InputFormats, to produce a "desired" number of splits per partition. This is useful when we have a few large files and we want to increase parallelism by increasing the number of splits generated. It is not yet so useful in cases where we would want to reduce the number of splits for a large number of files. It is not at all useful, also, in cases where there are a large number of partitions that this job will read. Also note that this is merely an optimization hint, and it is not guaranteed that the underlying layer will be capable of using this optimization. Also, mapreduce parameters mapred.min.split.size and mapred.max.split.size can be used in conjunction with this parameter to tweak/optimize jobs.

...