You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

The content of this page is reproduced from the tutorial written by Rishi Verma for OODT-217. This wiki is meant as a temporary holding place until the documentation makes its way into the official OODT source.

The following guide serves as a hands-on learning exercise for explaining the basics of how a CAS-PGE project can be set up and used. For this exercise, it will be necessary to download a sample project. Please obtain this sample project from the following link: fileconcatenator-pge.tar.

Example Overview

The example project detailed in this exercise is called FileConcatenatorPGE. This CAS-PGE project performs two functions. First, it collects two input files and concatenates them together into a second file. Second, it generates metadata from the generated product and ingests this metadata into a cas-filemanager instance.

Requirements

  • A deployed CAS-Workflow instance. See Workflow Basic User Guide for instructions on how to set this component up
  • A deployed CAS-Filemanager instance. See File Manager Basic User Guide for instructions on how to set this component up. Also see OODT Filemgr User Guide
  • A deployed CAS-Crawler. See Crawler User Guide for instructions on how to set this component up. Also see OODT Crawler Help
  • Maven 2
  • Environment variables
    • OODT_HOME=/path/to/oodt

    • PGE_ROOT=/path/to/oodt/pge

    • FILEMGR_HOME=/path/to/oodt/filemgr

    • WORKFLOW_HOME=/path/to/oodt/workflow

    • RESMGR_HOME=/path/to/oodt/resmgr

    • CRAWLER_HOME=/path/to/oodt/crawler

    • FILEMGR_URL=http://localhost:9000

    • WORKFLOW_URL=http://localhost:9001

      • note, sometimes, the workflow manager is listening on the port 9200 by default, ensure the port before setting the env variable.
    • RESMGR_URL=http://localhost:9002

      • note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.

  • The directory in which the PGE scripts and configuration files will reside.
  • Ensure both $WORKFLOW_HOME/lib and  $RESMGR_HOME/lib folders contain at least the following:
    • cas-crawler-<VERSION>.jar
    • cas-filemgr-<VERSION>.jar
    • cas-pge-<VERSION>.jar
      • Note, the version is cas-pge-03.jar, at the time when writing/editing the document. and one way of getting the version of pge jar is to get it from the following link, http://mvnrepository.com/artifact/org.apache.oodt/cas-pge/0.3; Alternatively, as shall be seen shortly in the section "4. Build and deploy FileConcatenatorPGE", we will need to build sample project fileconcatenator-pge with mvn which will automatically download the cas-pge-03.jar into your system, remember the location where cas-pge-03.jar gets downloaded, and copy it to the $WORKFLOW_HOME/lib,  $RESMGR_HOME/lib, and $FILEMGR_HOME/lib

1. Setting up CAS-PGE relevant directories.

There are a number of components associated with typical CAS-PGE deployments. These can include CAS-PGE configuration files, external scripts, input files, output files etc. The below steps will help guide you in setting up a configuration directory for the FileConcatenator PGE project as well as setting up a deployment directory for running your PGE. Note, the deployment directory could be located anywhere and it is assumed that for a production project, this directory could be shared among multiple-PGE services.

  1. Create CAS-PGE configuration directory

    cd /usr/local
    mkdir -p $PGE_ROOT/file_concatenator/pge-configs
    
  2. Create CAS-PGE deployment directory and make sure you have access to write in it

    mkdir -p $PGE_ROOT/file_concatenator/output/jobs
    ls -l $PGE_ROOT/file_concatenator/output
    sudo chmod 0755 $PGE_ROOT/file_concatenator/output/jobs

    • First command creates deployment directory for output files 

    • Second command yields permission status of the directory

    • Third command changes permission status of the directory to 775 (drwxrwxr-x)


  3. Create CAS-PGE input files directory

    mkdir -p $PGE_ROOT/file_concatenator/files
    
  4. Create CAS-PGE extractors directory

    mkdir -p $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

2. Download the FileConcatenatorPGE project

The FileConcatenatorPGE project is a Java project that uses the Maven build system for producing a run-time CAS-PGE library. Please follow the below instructions to download and extract the project.

  1. Download FileConcatenatorPGE
  2. Extract project

    tar xf fileconcatenator-pge.tar –C /usr/local/src
    

3. Customize and deploy the CAS-PGE configuration file

The CAS-PGE configuration file for identifying the steps involved in executing the PGE are located in fileconcatenator-pge/src/main/resources/config/PGEConfig.xml.

The PGEConfig.xml file performs the following functions:

  1. Describes how to run the PGE (ie. what external programs to call and in which order)
  2. Defines custom metadata used within the execution of the CAS-PGE
  3. Describes how to build metadata files generated as a result of the execution of the CAS-PGE and what to do with these files

Below is the sample PGEConfig.xml file used within the fileconcatenator-pge project:

PGEConfig.xml
<?xml version="1.0" encoding="UTF-8"?>
<pgeConfig>

  <!-- How to run the PGE -->
  <exe dir="[JobDir]" shell="/bin/bash">
    <!-- cd to PGE root -->
    <cmd>cd [PGE_ROOT]/file_concatenator</cmd>
    <cmd>cp [InputFile1] [OutputFile]</cmd>
    <cmd>cat [InputFile2] >> [OutputFile]</cmd>
  </exe>

  <!-- Files to ingest -->
  <output>
    <!-- one or more of these -->
    <dir path="[JobDir]" createBeforeExe="false">

      <!-- one or more of these ** regExp or name can be used-->
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.pge.examples.fileconcatenator.writers.ConcactenatingFilenameExtractorWriter"  args="[PGE_ROOT]/file_concatenator/extractors/concatenatingfilename.extractor.config.xml"/>
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="[PGE_ROOT]/file_concatenator/extractors/metlistwriter/metout.xml"/>
    </dir>
  </output>

  <!-- Custom metadata to add to output files -->
  <customMetadata>
    <!-- helpful keys -->
    <metadata key="LessThan" val="&#x3C;"/>
    <metadata key="LessThanOrEqualTo" val="[LessThan]="/>
    <metadata key="GreaterThan" val="&#x3E;"/>
    <metadata key="GreaterThanOrEqualTo" val="[GreaterThan]="/>
    <metadata key="Exclamation" val="&#33;"/>
    <metadata key="Ampersand" val="&#38;"/>
    <metadata key="NotEqualTo" val="[Ampersand]="/>
    <metadata key="LogicalAnd" val="[Ampersand][Ampersand]"/>
    <metadata key="CshPipeToStdOutAndError" val="[GreaterThan][Ampersand][Exclamation]"/>

    <metadata key="ProductionDateTime" val="[DATE.UTC]"/>
    <metadata key="JobDir" val="[PGE_ROOT]/file_concatenator/output/jobs/job-[ProductionDateTime]"/>
    <metadata key="InputFile1" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile1.txt"/>
    <metadata key="InputFile2" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile2.txt"/>
    <metadata key="OutputFile" val="[JobDir]/concatenatedOutputFile-[ProductionDateTime].txt"/>
  </customMetadata>

</pgeConfig>

4. Build and deploy FileConcatenatorPGE

Deploy the fileconcatenator-pge JAR package. 

cd /usr/local/src/fileconcatenator-pge
mvn package
cp target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib
cp target/fileconcatenator-pge-*.jar $RESMGR_HOME/lib

Note: For those who are using OODT 0.7, you need to make some changes in source file before building it. This is because the OODT version in pom.xml is 0.3 and maven cannot find the version. There are two ways fix this:

  • One quick solution is to change the OODT version in pom.xml into 0.3 and then compile. This will work perfectly compatible with OODT 0.7.  
    The following snippet is copied from the pom.xml of fileconcatenator-pge, the version before the change was "0.3 snapshot", change it to 0.3.

     

    <dependencies>

          <dependency>

              <groupId>org.apache.oodt</groupId>

              <artifactId>cas-pge</artifactId>

              <version>0.3</version>

              <scope>compile</scope>

          </dependency>

      </dependencies>

     

     

  • Another solution is to change the OODT version in pom.xml into 0.7 and follow the steps below:
    1. Add a PgeTaskMetadataKeys.java file in the fileconcatenator home: /usr/local/src/fileconcatenator-pge/src/main/java/org/apache/oodt/pge/examples/fileconcatenator. 
      The PgeTaskMetadataKey.java is one of the import source for FileConcatenatorPGETask.java file in the same above folder. Without the import source, compiler would fail to compile the FileConcatenatorPGETask.javaBy the way, the PgeTaskMetadataKeys.java can be found at:
      http://grepcode.com/file/repo1.maven.org/maven2/org.apache.oodt/cas-pge/0.3/org/apache/oodt/cas/pge/metadata/PgeTaskMetadataKeys.java
    2. Open the FileConcatenatorPGETask.javaadd a "thows Exception" for the overriden function updataStatus(String status).
    3. Save the above changes, use mvn package or mvn compile to build again.

Deploy fileconcatenator-pge resources

  1. PGEConfig.xml

    cp /usr/local/src/fileconcatenator-pge/src/main/resources/config/PGEConfig.xml $PGE_ROOT/file_concatenator/pge-configs
    
  2. Sample files

    cp /usr/local/src/fileconcatenator-pge/src/main/resources/files/concatenatingInputFile*.txt $PGE_ROOT/file_concatenator/files
    
  3. Extractor configuration file

    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/concatenatingfilename.extractor.config.xml $PGE_ROOT/file_concatenator/extractors
    
  4. Met-list writer configuration file

    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/metlistwriter/metout.xml $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

5. Configure deployed CAS-Workflow for running FileConcatenatorPGE

  1. Navigate to your deployed CAS-Workflow’s policy directory

    cd $WORKFLOW_HOME/policy
    
  2. Modify events.xml
    Add the following entry to this file:

    events.xml
        <event name="fileconcatenator-pge">
        	<workflow id="urn:oodt:FileConcatenatorWorkflow"/>
        </event>
    
  3. Create a new policy file titled: fileconcatenator-pge.workflow.xml.
    Add the following entries to this file:

    fileconcatenator-pge.workflow.xml
        <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
          name="FileConcatenatorWorkflow"
          id="urn:oodt:FileConcatenatorWorkflow">
    
          <tasks>
          	<task id="urn:oodt:FileConcatenator"/>
          </tasks>
        </cas:workflow>
    
  4. Modify tasks.xml
    Add the following entries to this file:

    tasks.xml
       <task id="urn:oodt:FileConcatenator" name="FileConcatenator"
         class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask">
    
         <conditions/>
    
         <configuration>
            <property name="PGETask_Name" value="FileConcatenator"/>
            <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/>
            <property name="PGETask_DumpMetadata" value="true"/>
            <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" />
            <property name="PCS_FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>
            <property name="PCS_MetFileExtension" value="met"/>
            <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
            <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
         </configuration>
    
         <requiredMetFields>
            <metfield name="RunID"/>
         </requiredMetFields>
    
       </task>
    
  5. Modify workflow-lifecycles.xml
    Add the following entries to this file (if not already present):

    workflow-lifecycles.xml
      <stage name="pge_setup_build_config_file">
        <status>BUILDING CONFIG FILE</status>
      </stage>
      <stage name="pge_staging_input">
        <status>STAGING INPUT</status>
      </stage>
      <stage name="pge_exec">
        <status>PGE EXEC</status>
      </stage>
      <stage name="pcs_crawl">
        <status>CRAWLING</status>
      </stage>
    
  6. Modify workflow-instance-met.xml
    Add the following entry to this file:

    workflow-instance-met.xml
    <workflow id="urn:oodt:FileConcatenatorWorkflow">
      <field name="RunID"/>
    </workflow>
    
  7. Restart CAS-Workflow

    cd $WORKFLOW_HOME/bin
    ./wmgr restart
    

6. Run File Concatenator PGE

  1. Navigate to CAS-Workflow home binary directory

    cd $WORKFLOW_HOME/bin
    
  2. Invoke the File Concatenator PGE by running the wmgr-client command-line

    ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
    

7. Verify output of PGE execution

After invoking the wmgr-client script as directed above, you should see an entry like the following:

INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a

Additionally, you should see a the below two files in the generated job directory:

  • Generated product file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt
  • Generated met file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt.met
  • No labels