MapReduce Streaming Job — POST mapreduce/streaming
Table of Contents |
---|
Description
Create and queue a Hadoop streaming MapReduce job.
...
Name | Description | Required? | Default |
---|---|---|---|
input | Location of the input data in Hadoop. | Required | None |
output | Location in which to store the output data. If not specified, WebHCat will store the output in a location that can be discovered using the queue resource. | Optional | See description |
mapper | Location of the mapper program in Hadoop. | Required | None |
reducer | Location of the reducer program in Hadoop. | Required | None |
file | Add an HDFS file to the distributed cache. | Optional | None |
define | Set a Hadoop configuration variable using the syntax | Optional | None |
cmdenv | Set an environment variable using the syntax | Optional | None |
arg | Set a program argument. | Optional | None |
statusdir | A directory where WebHCat will write the status of the Map Reduce job. If provided, it is the caller's responsibility to remove this directory when done. | Optional | None |
enablelog | If statusdir is set and enablelog is "true", collect Hadoop job configuration and logs into a directory named
This parameter was introduced in Hive 0.12.0. (See HIVE-4531.) | Optional in Hive 0.12.0+ | None |
callback | Define a URL to be called upon job completion. You may embed a specific job ID into this URL using | Optional | None |
...
Code and Data Setup
No Format |
---|
% cat mydata/file01 mydata/file02
Hello World Bye World
Hello Hadoop Goodbye Hadoop
% hadoop fs -put mydata/ .
% hadoop fs -ls mydata
Found 2 items
-rw-r--r-- 1 ctdean supergroup 23 2011-11-11 13:29 /user/ctdean/mydata/file01
-rw-r--r-- 1 ctdean supergroup 28 2011-11-11 13:29 /user/ctdean/mydata/file02
|
Curl Command
No Format |
---|
% curl -s -d user.name=ctdean \
-d input=mydata \
-d output=mycounts \
-d mapper=/bin/cat \
-d reducer="/usr/bin/wc -w" \
'http://localhost:50111/templeton/v1/mapreduce/streaming'
|
JSON Output
No Format |
---|
{
"id": "job_201111111311_0008",
"info": {
"stdout": "packageJobJar: [] [/Users/ctdean/var/hadoop/hadoop-0.20.205.0/share/hadoop/contrib/streaming/hadoop-streaming-0.20.205.0.jar...
templeton-job-id:job_201111111311_0008
",
"stderr": "11/11/11 13:26:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments
11/11/11 13:26:43 INFO mapred.FileInputFormat: Total input paths to process : 2
",
"exitcode": 0
}
}
|
Example Results
No Format |
---|
% hadoop fs -ls mycounts
Found 3 items
-rw-r--r-- 1 ctdean supergroup 0 2011-11-11 13:27 /user/ctdean/mycounts/_SUCCESS
drwxr-xr-x - ctdean supergroup 0 2011-11-11 13:26 /user/ctdean/mycounts/_logs
-rw-r--r-- 1 ctdean supergroup 10 2011-11-11 13:27 /user/ctdean/mycounts/part-00000
% hadoop fs -cat mycounts/part-00000
8
|
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Previous: PUT ddl/database/:db/table/:table/property/:property General: WebHCat Reference – WebHCat Manual – HCatalog Manual – Hive Wiki Home – Hive Project Site |