Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Follow instructions here: https://spark.apache.org/docs/latest/spark-standalone.html.  Make sure the following steps are done In particular:

  1. Install spark (either download pre-built spark, or build assembly from source).  Note that Spark has different distributions for different versions of Hadoop.  Keep note of the spark-assembly-*.jar location on the node Hive will run from.
  2. Start Spark cluster (Master and workers).  Keep note of the Spark master URL.  This can be found in Spark master WebUI.

...

  1. As of now, only hive/spark branch works against spark: https://github.com/apache/hive/tree/spark.  Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
  2. Start hive by adding and add the spark-assembly.jar to the hive auxpath.

    Code Block
    hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar
  3. Configure hive execution engine to run on spark:

    Code Block
    hive> set hive.execution.engine=spark;
  4. Configure required spark properties.  Guide is at: http://spark.apache.org/docs/latest/configuration.html.  This can be done either by adding spark-defaults.conf to the hive classpath, or as regular hive properties:

    Code Block
    hive> set spark.master=<spark master URL>
    
    hive> set spark.eventLog.enabled=true;             
    
    hive> set spark.executor.memory=512m;              
    
    hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

...