...
- As Hive on Spark is still in development, currently only a Hive assembly built from the Hive/Spark development branch supports Spark execution. The development branch is located here: https://github.com/apache/hive/tree/spark. Checkout the branch and build the Hive assembly as described in https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ.
- If you download Spark, make sure you use a 1.2.x assembly: http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.1.2.jar
There are several ways to add the spark dependency to Hive:
Set the property 'spark.home' to point to the spark installation:
Code Block hive> set spark.home=/location/to/spark
-assembly-*.
jar;
Set the spark-assembly jar on the Hive auxpath:
Code Block hive --auxpath /location/to/spark-assembly-*.jar
Add the spark-assembly jar for current user session:
Code Block hive> add jar /location/to/spark-assembly-*.jar;
Configure Hive execution to Spark:
Code Block hive> set hive.execution.engine=spark;
Configure Spark-application configs for Hive. See: http://spark.apache.org/docs/latest/configuration.html. This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration:
Code Block hive> set spark.master=<Spark Master URL> hive> set spark.eventLog.enabled=true; hive> set spark.executor.memory=512m; hive> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
Common Issues (Green are resolved, will be removed from this list)
Issue | Cause | Resolution | java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt (I)Lcom/google/common/hash/HashCode | Guava library version conflict between Spark and Hadoop. See HIVE-7387 and SPARK-2420 for details. | |
---|---|---|---|---|---|
Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit | Spark dependency not correctly set | Add spark dependency to Hive, see Step3 above. | |||
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set spark.serializer to be org.apache.spark.serializer.KryoSerializer as described above. | java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:257) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:224) | Hive is included in the Spark Assembly. | Either build a version of Spark without the "hive" profile or unjar the Spark assembly and rm -rf org/apache/hive org/apache/hadoop/hive and then rejar. The fix is in SPARK-2741, see Step 5 above. |
[ERROR] Terminal initialization failed; falling back to unsupported | Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib |
| |||
java.lang.SecurityException: class "javax.servlet.DispatcherType"'s | Two versions of the servlet-api are in the classpath. |
|