Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

set hive.execution.engine=spark;

Hive on Spark is available from Hive 1.1+ onward. It is still under active development in "spark" and "spark2" branches, and is periodically merged into the "master" branch for Hive.  
See HIVE-7292 and its sub-tasks and linked issueswas added in HIVE-7292.

Version Compatibility

Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.

30
Hive VersionSpark Version
master2.23.02
3.0.x2.3.0
2.3.x2.0.0
2.2.x1.6.0
2.1.x1.6.0
2.0.x1.5.0
1.2.x1.3.1
1.1.x1.2.0

...

  1. Install Spark (either download pre-built Spark, or build assembly from source).  
    • Install/build a compatible version.  Hive root pom.xml's <spark.version> defines what version of Spark it was built/tested with. 
    • Install/build a compatible distribution.  Each version of Spark has several distributions, corresponding with different versions of Hadoop.
    • Once Spark is installed, find and keep note of the <spark-assembly-*.jar> location.
    • Note that you must have a version of Spark which does not include the Hive jars. Meaning one which was not built with the Hive profile. If you will use Parquet tables, it's recommended to also enable the "parquet-provided" profile. Otherwise there could be conflicts in Parquet dependency. To remove Hive jars from the installation, simply use the following command under your Spark repository:profile. If you will use Parquet tables, it's recommended to also enable the "parquet-provided" profile. Otherwise there could be conflicts in Parquet dependency. To remove Hive jars from the installation, simply use the following command under your Spark repository:

      Prior to Spark 2.0.0:

      Code Block
      languagebash
      ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

      Since Prior to Spark 2.0.0:

      Code Block
      languagebash
      ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.47,parquet-provided"

      Since Spark 2.03.0:

      Code Block
      languagebash
      ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided,orc-provided"


  2. Start Spark cluster
    • Keep note of the <Spark Master URL>.  This can be found in Spark master WebUI.

...