...
Eagle running job spout collect the following data, following is the flow chart we should running job spout work flow
1) Running/Completed Job List
...
Gliffy Diagram | ||||
---|---|---|---|---|
|
Following are some interfaceinterfaces
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public interface ResourceFetcher { List<Object> getResource(JobConstants.ResourceType resourceType, Object... parameter) throws Exception; } |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public interface ServiceURLBuilder { String build(String ... parameters); } |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public interface HAURLSelector { boolean checkUrl(String url); void reSelectUrl() throws IOException; String getSelectedUrl(); } |
Support Spark Job Monitoring
Eagle running job spout pick up MR job monitoring as its first case, and consider to support spark job monitoring as well
Spark on Yarn Environment Setup
Following are some steps for setup a test spark on yarn env
1) prepare: install hdfs, yarn, java7, scala2.10
2) download spark: wget "http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.6.tgz"
3) unzip and put it in /opt/spark-1.5.2-bin-hadoop2.6
export SPARK_HOME=/opt/spark-1.5.2-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
4) set config for spark job, here we forward spark applications' logs to hdfs, then spark history server can read logs and expose restful APIs to report application status(history server can report both running & completed application status)
Code Block | ||
---|---|---|
| ||
spark.yarn.max_executor.failures 3
spark.yarn.applicationMaster.waitTries 10
spark.history.kerberos.keytab none
spark.yarn.preserve.staging.files False
spark.yarn.submit.file.replication 3
spark.history.kerberos.principal none
spark.yarn.historyServer.address <hostname>:18080
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.queue default
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 384
spark.history.ui.port 18080
spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.max.executor.failures 3
spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.yarn.executor.memoryOverhead 384
spark.eventLog.enabled true
spark.eventLog.dir hdfs://<hostname>:8020/directory |
5) set history server config in /opt/spark-1.5.2-bin-hadoop2.6/bin/load-spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://<hostname>:8020/directory"
6) ./sbin/start-master.sh hdfs://<hostname>:8020
./sbin/start-slave.sh spark://localhost:7077
./sbin/start-history-server.sh hdfs://<hostaname>:8020
Spark Restful API for monitoring
Following are some spark restful APIs
List spark applications: http://<hostname>:18080/api/v1/applications
Code Block | ||||
---|---|---|---|---|
| ||||
[
{
id: "application_1452593058395_0008",
name: "PySparkShell",
attempts: [
{
startTime: "2016-01-13T09:55:43.701GMT",
endTime: "2016-01-13T09:57:52.658GMT",
sparkUser: "root",
completed: true
}
]
},
{
id: "application_1452593058395_0007",
name: "PySparkShell",
attempts: [
{
startTime: "2016-01-13T08:22:12.346GMT",
endTime: "2016-01-13T09:48:25.615GMT",
sparkUser: "root",
completed: true
}
]
},
{
id: "application_1452593058395_0006",
name: "PySparkShell",
attempts: [
{
startTime: "2016-01-12T15:27:49.038GMT",
endTime: "2016-01-12T18:05:48.678GMT",
sparkUser: "root",
completed: false
}
]
}
] |
Return the stages info of a specific application: http://<hostname>:18080/api/v1/applications/application_1452593058395_0008/stages
Code Block | ||||
---|---|---|---|---|
| ||||
[
{
status: "COMPLETE",
stageId: 0,
attemptId: 0,
numActiveTasks: 0,
numCompleteTasks: 2,
numFailedTasks: 0,
executorRunTime: 2256,
inputBytes: 383,
inputRecords: 16,
outputBytes: 0,
outputRecords: 0,
shuffleReadBytes: 0,
shuffleReadRecords: 0,
shuffleWriteBytes: 0,
shuffleWriteRecords: 0,
memoryBytesSpilled: 0,
diskBytesSpilled: 0,
name: "count at <stdin>:1",
details: "",
schedulingPool: "default",
accumulatorUpdates: [ ]
},
{
status: "FAILED",
stageId: 1,
attemptId: 0,
numActiveTasks: 1,
numCompleteTasks: 0,
numFailedTasks: 7,
executorRunTime: 497,
inputBytes: 1149,
inputRecords: 55,
outputBytes: 0,
outputRecords: 0,
shuffleReadBytes: 0,
shuffleReadRecords: 0,
shuffleWriteBytes: 0,
shuffleWriteRecords: 0,
memoryBytesSpilled: 0,
diskBytesSpilled: 0,
name: "sum at <stdin>:1",
details: "",
schedulingPool: "default",
accumulatorUpdates: [ ]
}
] |
Return the job info of a specific application: http://<hostname>:18080/api/v1/applications/application_1452593058395_0008/jobs
Code Block | ||||
---|---|---|---|---|
| ||||
[
{
jobId: 1,
name: "sum at <stdin>:1",
submissionTime: "2016-01-13T09:56:43.335GMT",
completionTime: "2016-01-13T09:56:43.710GMT",
stageIds: [
1
],
status: "FAILED",
numTasks: 2,
numActiveTasks: 1,
numCompletedTasks: 0,
numSkippedTasks: 0,
numFailedTasks: 7,
numActiveStages: 0,
numCompletedStages: 0,
numSkippedStages: 0,
numFailedStages: 1
},
{
jobId: 0,
name: "count at <stdin>:1",
submissionTime: "2016-01-13T09:56:07.496GMT",
completionTime: "2016-01-13T09:56:09.299GMT",
stageIds: [
0
],
status: "SUCCEEDED",
numTasks: 2,
numActiveTasks: 0,
numCompletedTasks: 2,
numSkippedTasks: 2,
numFailedTasks: 0,
numActiveStages: 0,
numCompletedStages: 1,
numSkippedStages: 0,
numFailedStages: 0
}
] |
Notes
Spark History Server reply on logs written by spark applications to report applications' status
But sometime logs may not be correctly updated by spark jobs, for example the following job is actually completed, but the logs on hdfs shows it's still in progress(not completed), which cause spark history server report wrong status
ID | User | Name | Application Type | Queue | StartTime | FinishTime | State | FinalStatus | Progress | Tracking UI |
---|---|---|---|---|---|---|---|---|---|---|
application_1452593058395_0006 | root | PySparkShell | SPARK | default | Tue, 12 Jan 2016 15:27:54 GMT | Tue, 12 Jan 2016 18:05:49 GMT | FINISHED | SUCCEEDED | History |
hdfs dfs -ls /directory/
Found 4 items
-rwxrwx--- 3 root supergroup 13227 2016-01-12 15:27 /directory/application_1452593058395_0005
-rwxrwx--- 3 root supergroup 13227 2016-01-12 18:05 /directory/application_1452593058395_0006.inprogress
-rwxrwx--- 3 root supergroup 51025 2016-01-13 09:48 /directory/application_1452593058395_0007
-rwxrwx--- 3 root supergroup 67994 2016-01-13 09:57 /directory/application_1452593058395_0008