You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

The content of this page is reproduced from the tutorial written by Rishi Verma for OODT-217. This wiki is meant as a temporary holding place until the documentation makes its way into the official OODT source.

The following guide serves as a hands-on learning exercise for explaining the basics of how a CAS-PGE project can be set up and used. For this exercise, it will be necessary to download a sample project. Please obtain this sample project from the following link: fileconcatenator-pge.tar.

Example Overview

The example project detailed in this exercise is called FileConcatenatorPGE. This CAS-PGE project performs two functions. First, it collects two input files and concatenates them together into a second file. Second, it generates metadata from the generated product and ingests this metadata into a cas-filemanager instance.

Requirements

  • A deployed CAS-Workflow instance. See Workflow Basic User Guide for instructions on how to set this component up
  • A deployed CAS-Filemanager instance. See File Manager Basic User Guide for instructions on how to set this component up. Also see OODT Filemgr User Guide
  • A deployed CAS-Crawler. See Crawler User Guide for instructions on how to set this component up. Also see OODT Crawler Help
  • Maven 2
  • Environment variables
    • CRAWLER_HOME
    • WORKFLOW_HOME
    • WORKFLOW_URL
    • FILEMGR_URL
    • PGE_ROOT = /usr/local/pge
  • The directory in which the PGE scripts and configuration files will reside.
  • Ensure $WORKFLOW_HOME/lib contains at least the following:
    • cas-crawler-<VERSION>.jar
    • cas-filemgr-<VERSION>.jar
    • cas-pge-<VERSION>.jar

1. Setting up CAS-PGE relevant directories.

There are a number of components associated with typical CAS-PGE deployments. These can include CAS-PGE configuration files, external scripts, input files, output files etc. The below steps will help guide you in setting up a configuration directory for the FileConcatenator PGE project as well as setting up a deployment directory for running your PGE. Note, the deployment directory could be located anywhere and it is assumed that for a production project, this directory could be shared among multiple-PGE services.

  1. Create CAS-PGE configuration directory
    cd /usr/local
    mkdir –p $PGE_ROOT/file_concatenator/pge-configs
    
  2. Create CAS-PGE deployment directory
    mkdir –p $PGE_ROOT/file_concatenator/output/jobs
    
  3. Create CAS-PGE input files directory
    mkdir –p $PGE_ROOT/file_concatenator/files
    
  4. Create CAS-PGE extractors directory
    mkdir –p $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

2. Download the FileConcatenatorPGE project

The FileConcatenatorPGE project is a Java project that uses the Maven build system for producing a run-time CAS-PGE library. Please follow the below instructions to download and extract the project.

  1. Download FileConcatenatorPGE
  2. Extract project
    tar xf fileconcatenator-pge.tar –C /usr/local/src
    

3. Customize and deploy the CAS-PGE configuration file

The CAS-PGE configuration file for identifying the steps involved in executing the PGE are located in fileconcatenator-pge/src/main/resources/config/PGEConfig.xml.

The PGEConfig.xml file performs the following functions:

  1. Describes how to run the PGE (ie. what external programs to call and in which order)
  2. Defines custom metadata used within the execution of the CAS-PGE
  3. Describes how to build metadata files generated as a result of the execution of the CAS-PGE and what to do with these files

Below is the sample PGEConfig.xml file used within the fileconcatenator-pge project:

PGEConfig.xml
<?xml version="1.0" encoding="UTF-8"?>
<pgeConfig>

  <!-- How to run the PGE -->
  <exe dir="[JobDir]" shell="/bin/bash">
    <!-- cd to PGE root -->
    <cmd>cd [PGE_ROOT]/file_concatenator</cmd>
    <cmd>cp [InputFile1] [OutputFile]</cmd>
    <cmd>cat [InputFile2] >> [OutputFile]</cmd>
  </exe>

  <!-- Files to ingest -->
  <output>
    <!-- one or more of these -->
    <dir path="[JobDir]" createBeforeExe="false">

      <!-- one or more of these ** regExp or name can be used-->
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.pge.examples.fileconcatenator.writers.ConcactenatingFilenameExtractorWriter"  args="[PGE_ROOT]/file_concatenator/extractors/concatenatingfilename.extractor.config.xml"/>
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="[PGE_ROOT]/file_concatenator/extractors/metlistwriter/metout.xml"/>
    </dir>
  </output>

  <!-- Custom metadata to add to output files -->
  <customMetadata>
    <!-- helpful keys -->
    <metadata key="LessThan" val="&#x3C;"/>
    <metadata key="LessThanOrEqualTo" val="[LessThan]="/>
    <metadata key="GreaterThan" val="&#x3E;"/>
    <metadata key="GreaterThanOrEqualTo" val="[GreaterThan]="/>
    <metadata key="Exclamation" val="&#33;"/>
    <metadata key="Ampersand" val="&#38;"/>
    <metadata key="NotEqualTo" val="[Ampersand]="/>
    <metadata key="LogicalAnd" val="[Ampersand][Ampersand]"/>
    <metadata key="CshPipeToStdOutAndError" val="[GreaterThan][Ampersand][Exclamation]"/>

    <metadata key="ProductionDateTime" val="[DATE.UTC]"/>
    <metadata key="JobDir" val="[PGE_ROOT]/file_concatenator/output/jobs/job-[ProductionDateTime]"/>
    <metadata key="InputFile1" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile1.txt"/>
    <metadata key="InputFile2" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile2.txt"/>
    <metadata key="OutputFile" val="[JobDir]/concatenatedOutputFile-[ProductionDateTime].txt"/>
  </customMetadata>

</pgeConfig>

4. Build and deploy FileConcatenatorPGE

Deploy the fileconcatenator-pge JAR pacakge

cd /usr/local/src/fileconcatenator-pge
mvn package
mv target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib

Deploy fileconcatenator-pge resources

  1. PGEConfig.xml
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/config/PGEConfig.xml $PGE_ROOT/file_concatenator/pge-configs
    
  2. Sample files
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/files/concatenatingInputFile*.txt $PGE_ROOT/file_concatenator/files
    
  3. Extractor configuration file
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/concatenatingfilename.extractor.config.xml $PGE_ROOT/file_concatenator/extractors
    
  4. Met-list writer configuration file
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/metlistwriter/metout.xml $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

5. Configure deployed CAS-Workflow for running FileConcatenatorPGE

  1. Navigate to your deployed CAS-Workflow’s policy directory
    cd $WORKFLOW_HOME/policy
    
  2. Modify events.xml
    Add the following entry to this file:
    events.xml
        <event name="fileconcatenator-pge">
        	<workflow id="urn:oodt:FileConcatenatorWorkflow"/>
        </event>
    
  3. Create a new policy file titled: fileconcatenator-pge.workflow.xml.
    Add the following entries to this file:
    fileconcatenator-pge.workflow.xml
        <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
          name="FileConcatenatorWorkflow"
          id="urn:oodt:FileConcatenatorWorkflow">
    
          <tasks>
          	<task id="urn:oodt:FileConcatenator"/>
          </tasks>
        </cas:workflow>
    
  4. Modify tasks.xml
    Add the following entries to this file:
    tasks.xml
       <task id="urn:oodt:FileConcatenator" name="FileConcatenator"
         class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask">
    
         <conditions/>
    
         <configuration>
            <property name="PGETask_Name" value="FileConcatenator"/>
            <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/>
            <property name="PGETask_DumpMetadata" value="true"/>
            <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" />
            <property name="PCS_FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>
            <property name="PCS_MetFileExtension" value="met"/>
            <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
            <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
         </configuration>
    
         <requiredMetFields>
            <metfield name="RunID"/>
         </requiredMetFields>
    
       </task>
    
  5. Modify workflow-lifecycles.xml
    Add the following entries to this file (if not already present):
    workflow-lifecycles.xml
      <stage name="pge_setup_build_config_file">
        <status>BUILDING CONFIG FILE</status>
      </stage>
      <stage name="pge_staging_input">
        <status>STAGING INPUT</status>
      </stage>
      <stage name="pge_exec">
        <status>PGE EXEC</status>
      </stage>
      <stage name="pcs_crawl">
        <status>CRAWLING</status>
      </stage>
    
  6. Modify workflow-instance-met.xml
    Add the following entry to this file:
    workflow-instance-met.xml
    <workflow id="urn:oodt:FileConcatenatorWorkflow">
      <field name="RunID"/>
    </workflow>
    
  7. Restart CAS-Workflow
    cd $WORKFLOW_HOME/bin
    ./wmgr restart
    

6. Run File Concatenator PGE

  1. Navigate to CAS-Workflow home binary directory
    cd $WORKFLOW_HOME/bin
    
  2. Invoke the File Concatenator PGE by running the wmgr-client command-line
    ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
    

7. Verify output of PGE execution

After invoking the wmgr-client script as directed above, you should see an entry like the following:

INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a

Additionally, you should see a the below two files in the generated job directory:

  • Generated product file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt
  • Generated met file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt.met
  • No labels