...
- A deployed CAS-Workflow instance. See Workflow Basic User Guide for instructions on how to set this component up
- A deployed CAS-Filemanager instance. See File Manager Basic User Guide for instructions on how to set this component up. Also see OODT Filemgr User Guide
- A deployed CAS-Crawler. See Crawler User Guide for instructions on how to set this component up. Also see OODT Crawler Help
- Maven 2
- Environment variables
OODT_HOME=/path/to/oodt
PGE_ROOT=/path/to/oodt/pge
FILEMGR_HOME=/path/to/oodt/filemgr
WORKFLOW_HOME=/path/to/oodt/workflow
RESMGR_HOME=/path/to/oodt/resmgr
CRAWLER_HOME=/path/to/oodt/crawler
FILEMGR_URL=http://localhost:9000
WORKFLOW_URL=http://localhost:9001
- note, sometimes, the workflow manager is listening on the port 9200 by default, ensure the port before setting the env variable.
RESMGR_URL=http://localhost:9002
- note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.
- note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.
- The directory in which the PGE scripts and configuration files will reside.
- Ensure both $WORKFLOW_HOME/lib and $RESMGR_HOME/lib folders contain at least the following:
- cas-crawler-<VERSION>.jar
- cas-filemgr-<VERSION>.jar
- cas-pge-<VERSION>.jar
- Note, the version is cas-pge-03.jar, at the time when writing/editing the document (OODT 0.6). and one way of getting the version of pge jar is to get it from the following link, http://mvnrepository.com/artifact/org.apache.oodt/cas-pge/0.3; Alternatively, as shall be seen shortly in the section "4. Build and deploy FileConcatenatorPGE", we will need to build sample project fileconcatenator-pge with mvn which will automatically download the cas-pge-03.jar into your system, remember where it is being downloaded the location where cas-pge-03.jar gets downloaded, and copy it to the $WORKFLOW_HOME/lib, $RESMGR_HOME/lib, and $FILEMGR_HOME/lib
...
Navigate to your deployed CAS-Workflow’s policy directory
No Format cd $WORKFLOW_HOME/policy
Modify events.xml
Add the following entry to this file:Code Block title events.xml <event name="fileconcatenator-pge"> <workflow id="urn:oodt:FileConcatenatorWorkflow"/> </event>
Create a new policy file titled: fileconcatenator-pge.workflow.xml.
Add the following entries to this file:Code Block title fileconcatenator-pge.workflow.xml <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas" name="FileConcatenatorWorkflow" id="urn:oodt:FileConcatenatorWorkflow"> <tasks> <task id="urn:oodt:FileConcatenator"/> </tasks> </cas:workflow>
Modify tasks.xml
Add the following entries to this file:Code Block title tasks.xml <task id="urn:oodt:FileConcatenator" name="FileConcatenator" class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask"> <conditions/> <configuration> <property name="PGETask_Name" value="FileConcatenator"/> <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/> <property name="PGETask_DumpMetadata" value="true"/> <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" /> <property name="PCS_FileManagerUrl" value="[FILEMGR_URL]" envReplace="true"/> <property name="PCS_MetFileExtension" value="met"/> <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/> <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/> </configuration> <requiredMetFields> <metfield name="RunID"/> </requiredMetFields> </task>
Note: [ PGE_ROOT ] does not work here. Replace [ PGE_ROOT ] with the absolute path to pge root in tasks.xml and PGEConfig.xml
Modify workflow-lifecycles.xml
Add the following entries to this file (if not already present):Code Block title workflow-lifecycles.xml <stage name="pge_setup_build_config_file"> <status>BUILDING CONFIG FILE</status> </stage> <stage name="pge_staging_input"> <status>STAGING INPUT</status> </stage> <stage name="pge_exec"> <status>PGE EXEC</status> </stage> <stage name="pcs_crawl"> <status>CRAWLING</status> </stage>
Modify workflow-instance-met.xml
Add the following entry to this file:Code Block title workflow-instance-met.xml <workflow id="urn:oodt:FileConcatenatorWorkflow"> <field name="RunID"/> </workflow>
Restart CAS-Workflow
No Format cd $WORKFLOW_HOME/bin ./wmgr restart
...
Navigate to CAS-Workflow home binary directory
No Format cd $WORKFLOW_HOME/bin
Invoke the File Concatenator PGE by running the wmgr-client command-line
No Format ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
Note, before running the ./wmgr-client command, please ensure the following servers are up and running.
Note: In case you encounter errors relevant to PGEConfig.xml file path, check what value is getting set for PGETask_ConfigFilePath in http://localhost:8080/opsui -> WorkFlow Monitor -> FileConcatenator .
If this value begins with null, make sure you have replaced [ PGE_ROOT ] with its absolute path in tasks.xml and PGEConfig.xml
7. 7. Verify output of PGE execution
...
No Format |
---|
INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a
|
Additionally, you should see a the below two files in the generated job directory:
Generated product file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt
Generated met file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt.met