...
Code Block | ||||
---|---|---|---|---|
| ||||
[ { jobId: 1, name: "sum at <stdin>:1", submissionTime: "2016-01-13T09:56:43.335GMT", completionTime: "2016-01-13T09:56:43.710GMT", stageIds: [ 1 ], status: "FAILED", numTasks: 2, numActiveTasks: 1, numCompletedTasks: 0, numSkippedTasks: 0, numFailedTasks: 7, numActiveStages: 0, numCompletedStages: 0, numSkippedStages: 0, numFailedStages: 1 }, { jobId: 0, name: "count at <stdin>:1", submissionTime: "2016-01-13T09:56:07.496GMT", completionTime: "2016-01-13T09:56:09.299GMT", stageIds: [ 0 ], status: "SUCCEEDED", numTasks: 2, numActiveTasks: 0, numCompletedTasks: 2, numSkippedTasks: 2, numFailedTasks: 0, numActiveStages: 0, numCompletedStages: 1, numSkippedStages: 0, numFailedStages: 0 } ] |
Notes
Spark History Server reply on logs written by spark applications to report applications' status
But sometime logs may not be correctly updated by spark jobs, for example the following job is actually completed, but the logs on hdfs shows it's still in progress(not completed), which cause spark history server report wrong status
ID | User | Name | Application Type | Queue | StartTime | FinishTime | State | FinalStatus | Progress | Tracking UI |
---|---|---|---|---|---|---|---|---|---|---|
application_1452593058395_0006 | root | PySparkShell | SPARK | default | Tue, 12 Jan 2016 15:27:54 GMT | Tue, 12 Jan 2016 18:05:49 GMT | FINISHED | SUCCEEDED | History |
hdfs dfs -ls /directory/
Found 4 items
-rwxrwx--- 3 root supergroup 13227 2016-01-12 15:27 /directory/application_1452593058395_0005
-rwxrwx--- 3 root supergroup 13227 2016-01-12 18:05 /directory/application_1452593058395_0006.inprogress
-rwxrwx--- 3 root supergroup 51025 2016-01-13 09:48 /directory/application_1452593058395_0007
-rwxrwx--- 3 root supergroup 67994 2016-01-13 09:57 /directory/application_1452593058395_0008