Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Install Spark (either download pre-built Spark, or build assembly from source).  
  2. Start Spark cluster (Master and workers).
    • Keep note of the <Spark Master URL>.  This can be found in Spark master WebUI.

...

  1. As Hive on Spark is still in development, currently only a Hive assembly built from the Hive/Spark development branch supports Spark execution.  The development branch is located here: https://github.com/apache/hive/tree/spark.  Checkout the branch and build the Hive assembly as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
  2. If you download Spark, make sure you use a 1.12.x assembly: http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.1.2.jar
  3. Start Hive with <spark-assembly-*.jar> on the Hive auxpath:

    Code Block
    hive --auxpath /location/to/spark-assembly-*.jar
  4. Configure Hive execution to Spark:

    Code Block
    hive> set hive.execution.engine=spark;
  5. Configure Spark-application configs for Hive.  See: http://spark.apache.org/docs/latest/configuration.html.  This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration:

    Code Block
    hive> set spark.master=<Spark Master URL>
    
    hive> set spark.eventLog.enabled=true;             
    
    hive> set spark.executor.memory=512m;              
    
    hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

...