Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Follow instructions to install Spark: https http://spark.apache.org/docs/latest/sparkrunning-on-standaloneyarn.html (or httphttps://spark.apache.org/docs/latest/runningspark-on-yarnstandalone.html, if  if you are running Spark Standalone mode). Hive on Spark supports Spark on Yarn )mode as default.  In particularIn particular, for the installation you'll need to:

  1. Install Spark (either download pre-built Spark, or build assembly from source).  
    • Install/build a compatible version.  Hive root pom.xml's <spark.version> defines what version of Spark it was built/tested with. 
    • Install/build a compatible distribution.  Each version of Spark has several distributions, corresponding with different versions of Hadoop.
    • Once Spark is installed, find and keep note of the <spark-assembly-*.jar> location.
    • Note that you must have a version of Spark which does not include the Hive jars. Meaning one which was not built with the Hive profile. To remove Hive jars from the installation, simply use the following command under your Spark repository:

      Code Block
      languagebash
      ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4"
  2. Start Spark cluster (both standalone and Spark on YARN are supported).
    • Keep note of the <Spark Master URL>.  This can be found in Spark master WebUI.

...