Page History

...

Install spark (either download pre-built spark, or build assembly from source). Note that Spark has distributions for To find out what version of Spark that Hive was built/tested on, check Hive's root pom.xml. Note each Spark version in turn has several distributions, corresponding with different versions of Hadoop. Keep Once spark is installed, find and keep note of the spark-assembly-*.jar location.
Start Spark cluster (Master and workers). Keep note of the Spark master URL. This can be found in Spark master WebUI.

...

Configuring Hive

As of nowHive on Spark is still in development, only a Hive assembly built from hive/spark branch works against spark: https://github.com/apache/hive/tree/spark. Build hive assembly from this branch as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.

Start hive and add the spark-assembly.jar to the hive auxpath.

Code Block
hive --auxpath /location/to/spark-assembly-spark_version-hadoop_version.jar

Configure hive execution engine to run on spark:
Code Block
hive> set hive.execution.engine=spark;

Configure required properties for spark-conf. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file "spark-defaults.conf" to the hive classpath, or configured as normal properties from hive.

Code Block
hive> set spark.master=<spark master URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;

...

Common Issues

Issue

Cause

Resolution

java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt

(I)Lcom/google/common/hash/HashCode

Guava library version conflict between Spark and Hadoop. See HIVE-7387 and SPARK-2420 for details.

Alternatives:

Temporarily remove guava jars from HADOOP_HOME.
Apply HIVE-7387-spark.patch to Spark branch and build new spark assembly. This shades Spark's guava.

org.apache.spark.SparkException: Job aborted due to stage failure:

Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable

Spark serializer not set to kryo

Set spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above

...

Space shortcuts

Child pages

Versions Compared

Old Version 7

New Version 8

Key

Configuring Hive

Common Issues