You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Hive on Spark: Getting Started

Spark Installation

Follow instructions here: https://spark.apache.org/docs/latest/spark-standalone.html.  In particular:

  1. Install spark (either download pre-built spark, or build assembly from source).  Note that Spark has different distributions for different versions of Hadoop.  Keep note of the spark-assembly-*.jar location.
  2. Start Spark cluster (Master and workers).  Keep note of the Spark master URL.  This can be found in Spark master WebUI.

Configuration Hive

  1. As of now, only hive/spark branch works against spark: https://github.com/apache/hive/tree/spark.  Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
  2. Start hive and add the spark-assembly.jar to the hive auxpath.

    hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar
  3. Configure hive execution engine to run on spark:

    hive> set hive.execution.engine=spark;
  4. Configure required spark properties.  Guide is at: http://spark.apache.org/docs/latest/configuration.html.  This can be done either by adding spark-defaults.conf to the hive classpath, or as regular hive properties:

    hive> set spark.master=<spark master URL>
    
    hive> set spark.eventLog.enabled=true;             
    
    hive> set spark.executor.memory=512m;              
    
    hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

Known Issues

 

IssueCauseResolution

java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode

Guava library version conflict between Spark and Hadoop.  See HIVE-7387 and SPARK-2420 for details.Temporarily remove guava jars from HADOOP_HOME, until JIRA's are resolved.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable

Spark serializer not set to kryoSet spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above

 

 

  • No labels