Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updates the deploy fileconcatenator-pge Jar package

...

The following guide serves as a hands-on learning exercise for explaining the basics of how a CAS-PGE project can be set up and used. For this exercise, it will be necessary to download a sample project. Please obtain this sample project from the following link: fileconcatenator-pge.tar.

Table of Contents

Example Overview

The example project detailed in this exercise is called FileConcatenatorPGE. This CAS-PGE project performs two functions. First, it collects two input files and concatenates them together into a second file. Second, it generates metadata from the generated product and ingests this metadata into a cas-filemanager instance.

...

There are a number of components associated with typical CAS-PGE deployments. These can include CAS-PGE configuration files, external scripts, input files, output files etc. The below steps will help guide you in setting up a configuration directory for the FileConcatenator PGE project as well as setting up a deployment directory for running your PGE. Note, the deployment directory could be located anywhere and it is assumed that for a production project, this directory could be shared among multiple-PGE services.

  1. Create CAS-PGE configuration directory

    No Format
    
    cd /usr/local
    mkdir –p $PGE_ROOT/file_concatenator/pge-configs
    
  2. Create CAS-PGE deployment directory

    No Format
    
    mkdir –p $PGE_ROOT/file_concatenator/output/jobs
    
  3. Create CAS-PGE input files directory

    No Format
    
    mkdir –p $PGE_ROOT/file_concatenator/files
    
  4. Create CAS-PGE extractors directory

    No Format
    
    mkdir –p $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

...

  1. Download FileConcatenatorPGE
  2. Extract project

    No Format
    
    tar xf fileconcatenator-pge.tar –C /usr/local/src
    

...

Code Block
titlePGEConfig.xml

<?xml version="1.0" encoding="UTF-8"?>
<pgeConfig>

  <!-- How to run the PGE -->
  <exe dir="[JobDir]" shell="/bin/bash">
    <!-- cd to PGE root -->
    <cmd>cd [PGE_ROOT]/file_concatenator</cmd>
    <cmd>cp [InputFile1] [OutputFile]</cmd>
    <cmd>cat [InputFile2] >> [OutputFile]</cmd>
  </exe>

  <!-- Files to ingest -->
  <output>
    <!-- one or more of these -->
    <dir path="[JobDir]" createBeforeExe="false">

      <!-- one or more of these ** regExp or name can be used-->
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.pge.examples.fileconcatenator.writers.ConcactenatingFilenameExtractorWriter"  args="[PGE_ROOT]/file_concatenator/extractors/concatenatingfilename.extractor.config.xml"/>
      <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="[PGE_ROOT]/file_concatenator/extractors/metlistwriter/metout.xml"/>
    </dir>
  </output>

  <!-- Custom metadata to add to output files -->
  <customMetadata>
    <!-- helpful keys -->
    <metadata key="LessThan" val="&#x3C;"/>
    <metadata key="LessThanOrEqualTo" val="[LessThan]="/>
    <metadata key="GreaterThan" val="&#x3E;"/>
    <metadata key="GreaterThanOrEqualTo" val="[GreaterThan]="/>
    <metadata key="Exclamation" val="&#33;"/>
    <metadata key="Ampersand" val="&#38;"/>
    <metadata key="NotEqualTo" val="[Ampersand]="/>
    <metadata key="LogicalAnd" val="[Ampersand][Ampersand]"/>
    <metadata key="CshPipeToStdOutAndError" val="[GreaterThan][Ampersand][Exclamation]"/>

    <metadata key="ProductionDateTime" val="[DATE.UTC]"/>
    <metadata key="JobDir" val="[PGE_ROOT]/file_concatenator/output/jobs/job-[ProductionDateTime]"/>
    <metadata key="InputFile1" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile1.txt"/>
    <metadata key="InputFile2" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile2.txt"/>
    <metadata key="OutputFile" val="[JobDir]/concatenatedOutputFile-[ProductionDateTime].txt"/>
  </customMetadata>

</pgeConfig>

...

Deploy the fileconcatenator-pge JAR pacakgepackage. 

cd /usr/local/src/fileconcatenator-pge mvn package mv target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib
No Format
cd /usr/local/src/fileconcatenator-pge
mvn package
mv target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib

Note: For those who are using OODT 0.7, you need to make some changes in source file before building it. This is because the OODT version in pom.xml is 0.3-SNAPSHOT and maven cannot find the version. There are two ways fix this:

  • One quick solution is to change the OODT version in pom.xml into 0.3 and then compile. This will work perfectly compatible with OODT 0.7.
  • Another solution is to change the OODT version in pom.xml into 0.7 and follow the steps below:
    1. Add a PgeTaskMetadataKeys.java file in the fileconcatenator home: /usr/local/src/fileconcatenator-pge/src/main/java/org/apache/oodt/pge/examples/fileconcatenator. 
      The PgeTaskMetadataKey.java is one of the import source for FileConcatenatorPGETask.java file in the same above folder. Without the import source, compiler would fail to compile the FileConcatenatorPGETask.javaBy the way, the PgeTaskMetadataKeys.java can be found at:
      http://grepcode.com/file/repo1.maven.org/maven2/org.apache.oodt/cas-pge/0.3/org/apache/oodt/cas/pge/metadata/PgeTaskMetadataKeys.java
    2. Open the FileConcatenatorPGETask.javaadd a "thows Exception" for the overriden function updataStatus(String status).
    3. Save the above changes, use mvn package or mvn compile to build again.

Deploy fileconcatenator-pge resources

  1. PGEConfig.xml

    No Format
    
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/config/PGEConfig.xml $PGE_ROOT/file_concatenator/pge-configs
    
  2. Sample files

    No Format
    
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/files/concatenatingInputFile*.txt $PGE_ROOT/file_concatenator/files
    
  3. Extractor configuration file

    No Format
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/concatenatingfilename.extractor.config.xml $PGE_ROOT/file_concatenator/extractors
    
  4. Met-list writer configuration file

    No Format
    
    cp /usr/local/src/fileconcatenator-pge/src/main/resources/extractors/metlistwriter/metout.xml $PGE_ROOT/file_concatenator/extractors/metlistwriter
    

5. Configure deployed CAS-Workflow for running FileConcatenatorPGE

  1. Navigate to your deployed CAS-Workflow’s policy directory

    No Format
    
    cd $WORKFLOW_HOME/policy
    
  2. Modify events.xml
    Add the following entry to this file:

    Code Block
    titleevents.xml
    
        <event name="fileconcatenator-pge">
        	<workflow id="urn:oodt:FileConcatenatorWorkflow"/>
        </event>
    
  3. Create a new policy file titled: fileconcatenator-pge.workflow.xml.
    Add the following entries to this file:

    Code Block
    titlefileconcatenator-pge.workflow.xml
    
        <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
          name="FileConcatenatorWorkflow"
          id="urn:oodt:FileConcatenatorWorkflow">
    
          <tasks>
          	<task id="urn:oodt:FileConcatenator"/>
          </tasks>
        </cas:workflow>
    
  4. Modify tasks.xml
    Add the following entries to this file:

    Code Block
    titletasks.xml
    
       <task id="urn:oodt:FileConcatenator" name="FileConcatenator"
         class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask">
    
         <conditions/>
    
         <configuration>
            <property name="PGETask_Name" value="FileConcatenator"/>
            <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/>
            <property name="PGETask_DumpMetadata" value="true"/>
            <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" />
            <property name="PCS_FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>
            <property name="PCS_MetFileExtension" value="met"/>
            <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
            <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
         </configuration>
    
         <requiredMetFields>
            <metfield name="RunID"/>
         </requiredMetFields>
    
       </task>
    
  5. Modify workflow-lifecycles.xml
    Add the following entries to this file (if not already present):

    Code Block
    titleworkflow-lifecycles.xml
    
      <stage name="pge_setup_build_config_file">
        <status>BUILDING CONFIG FILE</status>
      </stage>
      <stage name="pge_staging_input">
        <status>STAGING INPUT</status>
      </stage>
      <stage name="pge_exec">
        <status>PGE EXEC</status>
      </stage>
      <stage name="pcs_crawl">
        <status>CRAWLING</status>
      </stage>
    
  6. Modify workflow-instance-met.xml
    Add the following entry to this file:

    Code Block
    titleworkflow-instance-met.xml
    
    <workflow id="urn:oodt:FileConcatenatorWorkflow">
      <field name="RunID"/>
    </workflow>
    
  7. Restart CAS-Workflow

    No Format
    
    cd $WORKFLOW_HOME/bin
    ./wmgr restart
    

6. Run File Concatenator PGE

  1. Navigate to CAS-Workflow home binary directory

    No Format
    
    cd $WORKFLOW_HOME/bin
    
  2. Invoke the File Concatenator PGE by running the wmgr-client command-line

    No Format
    
    ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1
    

...

After invoking the wmgr-client script as directed above, you should see an entry like the following:

No Format

INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a

...