...
- Download the Pig tutorial file to your local directory (pigtutorial.tar.gz)
- Unzip the Pig tutorial file (the files are stored in a newly created directory, pigtmp).
Code Block $ tar -xzf pigtutorial.tar.gz
- Move to the pigtmp directory.
- Review the contents of the Pig tutorial file.
- Copy the pig.jar file to the appropriate directory on your system. For example: /home/me/pig.
- Create an environment variable, PIGDIR, and point it to your directory. For example: export PIGDIR=/home/me/pig (bash, sh) or setenv PIGDIR /home/me/pig (tcsh, csh).
...
Pig Scripts: Local Mode
...
To run the Pig scripts in local mode, do the following:
- Move to the pigtmp directory.
- Review Pig Script 1 and Pig Script 2.
- Execute the following command (using either script1-local.pig or script2-local.pig).
Code Block $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local script1-local.pig
- Review the result file (either script1-local-results.txt or script2-local-results.txt):
Code Block $ ls -l script1-local-results.txt $ cat script1-local-results.txt
Pig Scripts: Hadoop Mode
To run the Pig scripts in hadoop (mapreduce) mode, do the following:
- Move to the pigtmp directory.
- Review Pig Script 1 and Pig_Script_2.
- Copy the excite.log.bz2 file from the pigtmp directory to the HDFS directory.
Code Block $ hadoop fs ‚-copyFromLocal excite.log.bz2 .
- Set the HADOOPSITEPATH environment variable to the location of your hadoop-site.xml file.
- Execute the following command (using either script1-hadoop.pig or script2-hadoop.pig):
1.#6 Review the result files (located in either the script1-hadoop-results or script2-hadoop-results HDFS directory):Code Block $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main script1-hadoop.pig
Code Block |
---|
$ hadoop fs -ls script1-hadoop-results $ hadoop fs -cat 'script1-hadoop-results/*' | less |
<<Anchor(
Anchor | |||
---|---|---|---|
|
|
Pig Tutorial File
The contents of the Pig tutorial file (pigtutorial.tar.gz) are described here.
File | Description |
---|---|
pig.jar | Pig JAR file |
tutorial.jar | User-defined functions (UDFs) and Java classes |
script1-local.pig | Pig Script 1, Query Phrase Popularity (local mode) |
script1-hadoop.pig | Pig Script 1, Query Phrase Popularity (Hadoop cluster) |
script2-local.pig | Pig Script 2, Temporal Query Phrase Popularity (local mode) |
script2-hadoop.pig | Pig Script 2, Temporal Query Phrase Popularity (Hadoop cluster) |
excite-small.log | Log file, Excite search engine (local mode) |
excite.log.bz2 | Log file, Excite search engine (Hadoop cluster) |
A better-documented version of script1-local.pig can be found at https://cwiki.apache.org/confluence/download/attachments/27822259/script1-local-with-added-documentation.pig . It includes comments showing samples from each intermediate relation.
The user-defined functions (UDFs) are described here.
...