Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Install spark (either download pre-built spark, or build assembly from source).  
    • Download the correct version.  To find out what version of Spark that your particular Hive build was built/tested on, check your Hive's root pom.xml.
     Note each Spark version
    •  
    • Note: Each version of Spark in turn has several distributions, corresponding with different versions of Hadoop.  Choose the one corresponding to Hadoop installation.
     Once
    • Once spark is installed, find and keep note of the spark-assembly-*.jar location.
  2. Start Spark cluster (Master and workers).  Keep
    • Keep note of the Spark master URL.  This can be found in Spark master WebUI.

Configuring Hive

  1. As Hive on Spark is still in development, only a Hive assembly built from hive/spark development branch works against spark: https://github.com/apache/hive/tree/spark.  Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
  2. Start hive and add the spark-assembly.jar to the hive auxpath.

    Code Block
    hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar
  3. Configure hive execution engine to run on spark:

    Code Block
    hive> set hive.execution.engine=spark;
  4. Configure required properties for spark-conf.  See: http://spark.apache.org/docs/latest/configuration.html.  This can be done either by adding a file "spark-defaults.conf" to the hive classpath, or configured as normal properties from hive.

    Code Block
    hive> set spark.master=<spark master URL>
    
    hive> set spark.eventLog.enabled=true;             
    
    hive> set spark.executor.memory=512m;              
    
    hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

...

IssueCauseResolution

java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt

(I)Lcom/google/common/hash/HashCode

Guava library version conflict between Spark and Hadoop.  See HIVE-7387 and SPARK-2420 for details.

Alternatives until this is fixed:

  1. Temporarily remove guava jars from HADOOP_HOME.
  2. Apply Choose to build Spark assembly manually, apply HIVE-7387-spark.patch to Spark branch and build new spark assembly. This shades Spark's guavabefore building.

org.apache.spark.SparkException: Job aborted due to stage failure:

Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable

Spark serializer not set to kryoKryoSet spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above

...