Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Following are some spark restful APIs

Spark on Yarn Environment Setup

Following are some steps for setup a test spark on yarn env

1) prepare: install hdfs, yarn, java7, scala2.10

2) download spark: wget "http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.6.tgz"

3) unzip and put it in /opt/spark-1.5.2-bin-hadoop2.6

    export SPARK_HOME=/opt/spark-1.5.2-bin-hadoop2.6
    export PATH=$PATH:$SPARK_HOME/bin

4) set config for spark job, here we forward spark applications' logs to hdfs, then spark history server can read logs and expose restful APIs to report application status(history server can report both running & completed application status)

Code Block
title/opt/spark-1.5.2-bin-hadoop2.6/conf/spark-defaults.conf
spark.yarn.max_executor.failures 3
spark.yarn.applicationMaster.waitTries 10
spark.history.kerberos.keytab none
spark.yarn.preserve.staging.files False
spark.yarn.submit.file.replication 3
spark.history.kerberos.principal none
spark.yarn.historyServer.address <hostname>:18080
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.queue default
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 384
spark.history.ui.port 18080
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.max.executor.failures 3
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.yarn.executor.memoryOverhead 384
spark.eventLog.enabled true
spark.eventLog.dir hdfs://<hostname>:8020/directory

5) set history server config in /opt/spark-1.5.2-bin-hadoop2.6/bin/load-spark-env.sh

export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://<hostname>:8020/directory"

6) ./sbin/start-master.sh hdfs://druid-test-host1-556191.slc01.dev.ebayc3.com:8020

    ./sbin/start-slave.sh spark://localhost:7077

    ./sbin/start-history-server.sh hdfs://<hostaname>:8020

Spark Restful API for monitoring

Following are some spark restful APIs

List spark applications: http://<hostname>:18080/api/v1/applications

...