...
- A deployed CAS-Workflow instance. See Workflow Basic User Guide for instructions on how to set this component up
- A deployed CAS-Filemanager instance. See File Manager Basic User Guide for instructions on how to set this component up. Also see OODT Filemgr User Guide
- A deployed CAS-Crawler. See Crawler User Guide for instructions on how to set this component up. Also see OODT Crawler Help
- Maven 2
- Environment variables
OODT_HOME=/path/to/oodt
PGE_ROOT=/path/to/oodt/pge
FILEMGR_HOME=/path/to/oodt/filemgr
WORKFLOW_HOME=/path/to/oodt/workflow
RESMGR_HOME=/path/to/oodt/resmgr
CRAWLER_HOME=/path/to/oodt/crawler
FILEMGR_URL=http://localhost:9000
WORKFLOW_URL=http://localhost:9001
- note, sometimes, the workflow manager is listening on the port 9200 by default, ensure the port before setting this the env variable.
RESMGR_URL=http://localhost:9002
- note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.
- note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.
- The directory in which the PGE scripts and configuration files will reside.
- Ensure both $WORKFLOW_HOME/lib and $RESMGR_HOME/lib folders contain at least the following:
- cas-crawler-<VERSION>.jar
- cas-filemgr-<VERSION>.jar
- cas-pge-<VERSION>.jar
- Note, the version is cas-pge-03.jar, at the time when writing/editing the document (OODT 0.6). and one way of getting the version of pge jar is to get it from the following link, http://mvnrepository.com/artifact/org.apache.oodt/cas-pge/0.3; Alternatively, as shall be seen shortly in the section "4. Build and deploy FileConcatenatorPGE", we will need to build sample project fileconcatenator-pge with mvn which will automatically download the cas-pge-03.jar into your system, remember where it is being downloaded the location where cas-pge-03.jar gets downloaded, and copy it to the $WORKFLOW_HOME/lib, $RESMGR_HOME/lib, and $FILEMGR_HOME/lib
...
- One quick solution is to change the OODT version in pom.xml into 0.3 and then compile. This will work perfectly compatible with OODT 0.7.
The following snippet is copied from the pom.xml of fileconcatenator-pge, the version before the change was "0.3 snapshot", change it to 0.3.Align align left <dependencies>
<dependency>
<groupId>org.apache.oodt</groupId>
<artifactId>cas-pge</artifactId>
<version>0.3</version>
<scope>compile</scope>
</dependency>
</dependencies>
- Another solution is to change the Another solution is to change the OODT version in pom.xml into 0.7 and follow the steps below:
...
Navigate to your deployed CAS-Workflow’s policy directory
No Format cd $WORKFLOW_HOME/policy
Modify events.xml
Add the following entry to this file:Code Block title events.xml <event name="fileconcatenator-pge"> <workflow id="urn:oodt:FileConcatenatorWorkflow"/> </event>
Create a new policy file titled: fileconcatenator-pge.workflow.xml.
Add the following entries to this file:Code Block title fileconcatenator-pge.workflow.xml <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas" name="FileConcatenatorWorkflow" id="urn:oodt:FileConcatenatorWorkflow"> <tasks> <task id="urn:oodt:FileConcatenator"/> </tasks> </cas:workflow>
Modify tasks.xml
Add the following entries to this file:Code Block title tasks.xml <task id="urn:oodt:FileConcatenator" name="FileConcatenator" class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask"> <conditions/> <configuration> <property name="PGETask_Name" value="FileConcatenator"/> <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/> <property name="PGETask_DumpMetadata" value="true"/> <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" /> <property name="PCS_FileManagerUrl" value="[FILEMGR_URL]" envReplace="true"/> <property name="PCS_MetFileExtension" value="met"/> <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/> <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/> </configuration> <requiredMetFields> <metfield name="RunID"/> </requiredMetFields> </task>
Note: [ PGE_ROOT ] does not work here. Replace [ PGE_ROOT ] with the absolute path to pge root in tasks.xml and PGEConfig.xml
Modify workflow-lifecycles.xml
Add the following entries to this file (if not already present):Code Block title workflow-lifecycles.xml <stage name="pge_setup_build_config_file"> <status>BUILDING CONFIG FILE</status> </stage> <stage name="pge_staging_input"> <status>STAGING INPUT</status> </stage> <stage name="pge_exec"> <status>PGE EXEC</status> </stage> <stage name="pcs_crawl"> <status>CRAWLING</status> </stage>
Modify workflow-instance-met.xml
Add the following entry to this file:Code Block title workflow-instance-met.xml <workflow id="urn:oodt:FileConcatenatorWorkflow"> <field name="RunID"/> </workflow>
Restart CAS-Workflow
No Format cd $WORKFLOW_HOME/bin ./wmgr restart
...
Navigate to CAS-Workflow home binary directory
No Format cd $WORKFLOW_HOME/bin
Invoke the File Concatenator PGE by running the wmgr-client command-line
No Format ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --sendEventmetaData --eventName fileconcatenator-pge --metaData --key RunID testNumber1 key RunID testNumber1
Note, before running the ./wmgr-client command, please ensure the following servers are up and running.
Note: In case you encounter errors relevant to PGEConfig.xml file path, check what value is getting set for PGETask_ConfigFilePath in http://localhost:8080/opsui -> WorkFlow Monitor -> FileConcatenator .
If this value begins with null, make sure you have replaced [ PGE_ROOT ] with its absolute path in tasks.xml and PGEConfig.xml
7. Verify output of PGE execution
...
No Format |
---|
INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a
|
Additionally, you should see a the below two files in the generated job directory:
Generated product file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt
Generated met file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt.met