...
Need a endpoint summary table here that anchors to the other sections.
Table of Contents |
---|
The OODT PCS Pedigree service
JSON output format
For a set of product files, prod1, prod2, prod3, prod4, prod5, and prod6, where prod1 produced output files prod6 and prod2, and prod3, and prod3 produced output files prod5 and prod4.
No Format |
---|
{
"pedigree":{
"upstream":"prod1",
"downstream":{
"prod1":[
"prod6",
{
"prod3":[
"prod5",
"prod4"
]
},
"prod2"
]
}
}
}
|
The below are the different REST endpoints to generate full and partial pedigree reports.
Full Report
To see the full Pedigree report for the file myfile.txt, access:
No Format |
---|
http://host/pcs-services/services/pedgiree/report/myfile.txt
|
Just the Upstream lineage
To see just the upstream lineage for the file myfile.txt, access:
No Format |
---|
http://host/pcs-services/services/pedgiree/report/myfile.txt/upstream
|
Just the Downstream lineage
To see just the downstream lineage for the file myfile.txt, access:
No Format |
---|
http://host/pcs-services/services/pedgiree/report/myfile.txt/downstream
|
The OODT PCS Health Monitor service
JSON output format
All calls to the Health Monitor REST service provide the following JSON output:
No Format |
---|
{
"report": {
"crawlerStatus": [
{
"crawlerName": "Crawler1",
"crawlerPort": "9020",
"status": "UP",
"url": "localhost"
}
],
"daemonStatus": {
"stubs": [
{
"daemon": "batch stub",
"status": "UP",
"url": "http://localhost:2001"
}
],
"fm": {
"daemon": "File Manager",
"status": "UP",
"url": "http://localhost:9000"
},
"rm": {
"daemon": "Resource Manager",
"status": "UP",
"url": "http://localhost:9002"
},
"wm": {
"daemon": "Workflow Manager",
"status": "UP",
"url": "http://localhost:9001"
}
},
"generated": "2011-02-15T06:57:07.591-0800",
"ingestHealth": [
{
"avgCrawlTime": 132.78640211640212,
"crawler": "Crawler1",
"numCrawls": 189
}
],
"jobHealth": [
{
"numJobs": 0,
"state": "QUEUED"
},
{
"numJobs": 0,
"state": "RSUBMIT"
},
{
"numJobs": 0,
"state": "BUILDING CONFIG FILE"
},
{
"numJobs": 0,
"state": "PGE EXEC"
},
{
"numJobs": 0,
"state": "CRAWLING"
},
{
"numJobs": 0,
"state": "STAGING INPUT"
},
{
"numJobs": 7,
"state": "FINISHED"
},
{
"numJobs": 0,
"state": "STARTED"
},
{
"numJobs": 0,
"state": "PAUSED"
}
],
"latestFiles": {
"files": [
{
"filepath": "/Users/mattmann/files/foo.bar/foo.bar",
"receivedTime": "2011-01-22T15:19:21.126-08:00"
},
{
"filepath": "/Users/mattmann/files/foo.bar/foo.bar",
"receivedTime": "2011-01-22T15:08:10.198-08:00"
},
{
"filepath": "/Users/mattmann/files/foo.bar/foo.bar",
"receivedTime": "2011-01-22T15:06:03.659-08:00"
},
{
"filepath": "/Users/mattmann/files/blah.txt/blah.txt",
"receivedTime": "2011-01-21T21:56:03.922-08:00"
}
],
"topN": 20
}
}
}
|
The below REST-ful service descriptions show how to slice out different parts of the report. All reports at least have the generatedTime attribute, and then some combination of daemonStatus, and/or crawlerStatus, and (if the file manager is running) latestFiles, and if the workflow manager is running jobHealth, and if the crawlers are running ingestHealth.
Full Report
To see the full Health Monitor report, access:
No Format |
---|
http://host/pcs-services/services/health/report
|
...
To see the Health Monitor report, focused on just the PCS daemons (including batch stubs), access:
No Format |
---|
http://host/pcs-services/services/health/report/daemon
|
...
To see just the file manager status:
No Format |
---|
http://host/pcs-services/services/health/report/daemon/fm
|
To see just the workflow manager status
No Format |
---|
http://host/pcs-services/services/health/report/daemon/wm
|
To see just the resource manager status
No Format |
---|
http://host/pcs-services/services/health/report/daemon/rm
|
To see just the batch stub status
No Format |
---|
http://host/pcs-services/services/health/report/daemon/stubs
|
...
To see the Health Monitor report, focused on just the PCS ingest crawlers, access:
No Format |
---|
http://host/pcs-services/services/health/report/crawlers
|
...
To see a crawler with the name Crawler1, slice it out by:
No Format |
---|
http://host/pcs-services/services/health/report/crawlers/Crawler1
|
...
To see the Health Monitor report, focused on just the PCS workflow (job) processing status, broken down by state, access:
No Format |
---|
http://host/pcs-services/services/health/report/jobs
|
...
The following will display the number of jobs that are in the QUEUED state:
No Format |
---|
http://host/pcs-services/services/health/report/jobs/QUEUED
|
...
To see the Health Monitor report, focused on just the PCS ingest crawler health (with information like number of crawls and average crawl time), broken down by Crawler, access:
No Format |
---|
http://host/pcs-services/services/health/report/ingest
|
...
To see the ingest health status of the Crawler with the name Crawler1, access:
No Format |
---|
http://host/pcs-services/services/health/report/ingest/Crawler1
|
Slice a job processing status by state
...
Endpoint:
No Format |
---|
http://host/pcs-services/services/health/report/jobs/{state}
|
...
Status: 200 OK
JSON
No Format |
---|
{
"report":{
"generated":"2011-02-15T08:09:24.225-0800",
"jobHealth":[
{
"state":"QUEUED",
"numJobs":0
}
]
}
}
|
...
To display the number of jobs that are in the QUEUED state
No Format |
---|
http://host/pcs-services/services/health/report/jobs/QUEUED
|