Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  • A deployed CAS-Workflow instance. See Workflow Basic User Guide for instructions on how to set this component up
  • A deployed CAS-Filemanager instance. See File Manager Basic User Guide for instructions on how to set this component up. Also see OODT Filemgr User Guide
  • A deployed CAS-Crawler. See Crawler User Guide for instructions on how to set this component up. Also see OODT Crawler Help
  • Maven 2
  • Environment variables
    • OODT_HOME=/path/to/oodt

    • PGE_ROOT=/path/to/oodt/pge

    • FILEMGR_HOME=/path/to/oodt/filemgr

    • WORKFLOW_HOME=/path/to/oodt/workflow

    • RESMGR_HOME=/path/to/oodt/resmgr

    • CRAWLER_HOME=/path/to/oodt/crawler

    • FILEMGR_URL=http://localhost:9000

    • WORKFLOW_URL=http://localhost:9001

      • note, sometimes, the workflow manager is listening on the port 9200 by default, ensure the port before setting the env variable.
    • RESMGR_URL=http://localhost:9002

      • note, sometimes the resmgr is listening on the port 9300 by default, check the port before setting the variable.

  • The directory in which the PGE scripts and configuration files will reside.
  • Ensure both $WORKFLOW_HOME/lib and  $RESMGR_HOME/lib folders contain at least the following:
    • cas-crawler-<VERSION>.jar
    • cas-filemgr-<VERSION>.jar
    • cas-pge-<VERSION>.jar

      1. Setting up CAS-PGE relevant directories.

      There are a number of components associated with typical CAS-PGE deployments. These can include CAS-PGE configuration files, external scripts, input files, output files etc. The below steps will help guide you in setting up a configuration directory for the FileConcatenator PGE project as well as setting up a deployment directory for running your PGE. Note, the deployment directory could be located anywhere and it is assumed that for a production project, this directory could be shared among multiple-PGE services.

          • Note, the version is cas-pge-03.jar, at the time when writing/editing the document (OODT 0.6). and one way of getting the version of pge jar is to get it from the following link, http://mvnrepository.com/artifact/org.apache.oodt/cas-pge/0.3; Alternatively, as shall be seen shortly in the section "4. Build and deploy FileConcatenatorPGE", we will need to build sample project fileconcatenator-pge with mvn which will automatically download the cas-pge-03.jar into your system, remember the location where cas-pge-03.jar gets downloaded, and copy it to the $WORKFLOW_HOME/lib,  $RESMGR_HOME/lib, and $FILEMGR_HOME/lib

      1. Setting up CAS-PGE relevant directories.

      There are a number of components associated with typical CAS-PGE deployments. These can include CAS-PGE configuration files, external scripts, input files, output files etc. The below steps will help guide you in setting up a configuration directory for the FileConcatenator PGE project as well as setting up a deployment directory for running your PGE. Note, the deployment directory could be located anywhere and it is assumed that for a production project, this directory could be shared among multiple-PGE services.

      1. Create CAS-PGE configuration directory

        Create CAS-PGE configuration directory

        No Format
        cd /usr/local
        mkdir -p $PGE_ROOT/file_concatenator/pge-configs
        
      2. Create CAS-PGE deployment directory and make sure you have access to write in it

        No Format
        mkdir -p $PGE_ROOT/file_concatenator/output/jobs
        

        Make sure you have access to write in the CAS-PGE deployment directory

        No Format
        concatenator/output/jobs
        ls -l $PGE_ROOT/file_concatenator/output
        sudo chmod 0755 $PGE_ROOT/file_concatenator/output/jobs

        • First command creates deployment directory for output files 

        • Second command yields permission status of the directory

        • Second • Third command changes permission status of the directory to 775 (drwxrwxr-x)

         


      3. Create CAS-PGE input files directory

        No Format
        mkdir -p $PGE_ROOT/file_concatenator/files
        
      4. Create CAS-PGE extractors directory

        No Format
        mkdir -p $PGE_ROOT/file_concatenator/extractors/metlistwriter
        

      ...

      Code Block
      titlePGEConfig.xml
      <?xml version="1.0" encoding="UTF-8"?>
      <pgeConfig>
      
        <!-- How to run the PGE -->
        <exe dir="[JobDir]" shell="/bin/bash">
          <!-- cd to PGE root -->
          <cmd>cd [PGE_ROOT]/file_concatenator</cmd>
          <cmd>cp [InputFile1] [OutputFile]</cmd>
          <cmd>cat [InputFile2] >> [OutputFile]</cmd>
        </exe>
      
        <!-- Files to ingest -->
        <output>
          <!-- one or more of these -->
          <dir path="[JobDir]" createBeforeExe="false">
      
            <!-- one or more of these ** regExp or name can be used-->
            <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.pge.examples.fileconcatenator.writers.ConcactenatingFilenameExtractorWriter"  args="[PGE_ROOT]/file_concatenator/extractors/concatenatingfilename.extractor.config.xml"/>
            <files regExp=".*\.txt" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="[PGE_ROOT]/file_concatenator/extractors/metlistwriter/metout.xml"/>
          </dir>
        </output>
      
        <!-- Custom metadata to add to output files -->
        <customMetadata>
          <!-- helpful keys -->
          <metadata key="LessThan" val="&#x3C;"/>
          <metadata key="LessThanOrEqualTo" val="[LessThan]="/>
          <metadata key="GreaterThan" val="&#x3E;"/>
          <metadata key="GreaterThanOrEqualTo" val="[GreaterThan]="/>
          <metadata key="Exclamation" val="&#33;"/>
          <metadata key="Ampersand" val="&#38;"/>
          <metadata key="NotEqualTo" val="[Ampersand]="/>
          <metadata key="LogicalAnd" val="[Ampersand][Ampersand]"/>
          <metadata key="CshPipeToStdOutAndError" val="[GreaterThan][Ampersand][Exclamation]"/>
      
          <metadata key="ProductionDateTime" val="[DATE.UTC]"/>
          <metadata key="JobDir" val="[PGE_ROOT]/file_concatenator/output/jobs/job-[ProductionDateTime][ProductionDateTime]"/>
          <metadata key="InputFile1" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile1.txt"/>
          <metadata key="InputFile1InputFile2" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile1concatenatingInputFile2.txt"/>
          <metadata key="InputFile2OutputFile" val="[PGE_ROOT]/file_concatenator/files/concatenatingInputFile2.txt"/>
          <metadata key="OutputFile" val="[JobDir]/concatenatedOutputFile-[ProductionDateTime].txt"/>
        </customMetadata>
      
      </pgeConfig>
      

      4. Build and deploy FileConcatenatorPGE

      Deploy the fileconcatenator-pge JAR package. 

      No Format
      cd /usr/local/src/fileconcatenator-pge
      mvn package
      cp target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib
      cp target/fileconcatenator-pge-*.jar $RESMGR_HOME/lib

      Note: For those who are using OODT 0.7, you need to make some changes in source file before building it. This is because the OODT version in pom.xml is 0.3 and maven cannot find the version. There are two ways fix this:

      JobDir]/concatenatedOutputFile-[ProductionDateTime].txt"/>
        </customMetadata>
      
      </pgeConfig>
      

      4. Build and deploy FileConcatenatorPGE

      Deploy the fileconcatenator-pge JAR package. 

      No Format
      cd /usr/local/src/fileconcatenator-pge
      mvn package
      cp target/fileconcatenator-pge-*.jar $WORKFLOW_HOME/lib
      cp target/fileconcatenator-pge-*.jar $RESMGR_HOME/lib

      Note: For those who are using OODT 0.7, you need to make some changes in source file before building it. This is because the OODT version in pom.xml is 0.3 and maven cannot find the version. There are two ways fix this:

      • One quick solution is to change the OODT version in pom.xml into 0.3 and then compile. This will work perfectly compatible with OODT 0.7.  
        The following snippet is copied from the pom.xml of fileconcatenator-pge, the version before the change was "0.3 snapshot", change it to 0.3.

         

        Align
        alignleft

        <dependencies>

              <dependency>

                  <groupId>org.apache.oodt</groupId>

                  <artifactId>cas-pge</artifactId>

                  <version>0.3</version>

                  <scope>compile</scope>

              </dependency>

          </dependencies>

         

         

        One quick solution is to change the OODT version in pom.xml into 0.3 and then compile. This will work perfectly compatible with OODT 0.7.

      • Another solution is to change the OODT version in pom.xml into 0.7 and follow the steps below:

      ...

      1. Navigate to your deployed CAS-Workflow’s policy directory

        No Format
        cd $WORKFLOW_HOME/policy
        
      2. Modify events.xml
        Add the following entry to this file:

        Code Block
        titleevents.xml
            <event name="fileconcatenator-pge">
            	<workflow id="urn:oodt:FileConcatenatorWorkflow"/>
            </event>
        
      3. Create a new policy file titled: fileconcatenator-pge.workflow.xml.
        Add the following entries to this file:

        Code Block
        titlefileconcatenator-pge.workflow.xml
            <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
              name="FileConcatenatorWorkflow"
              id="urn:oodt:FileConcatenatorWorkflow">
        
              <tasks>
              	<task id="urn:oodt:FileConcatenator"/>
              </tasks>
            </cas:workflow>
        
      4. Modify tasks.xml
        Add the following entries to this file:

        Code Block
        titletasks.xml
           <task id="urn:oodt:FileConcatenator" name="FileConcatenator"
             class="org.apache.oodt.pge.examples.fileconcatenator.FileConcatenatorPGETask">
        
             <conditions/>
        
             <configuration>
                <property name="PGETask_Name" value="FileConcatenator"/>
                <property name="PGETask_ConfigFilePath" value="[PGE_ROOT]/file_concatenator/pge-configs/PGEConfig.xml" envReplace="true"/>
                <property name="PGETask_DumpMetadata" value="true"/>
                <property name="PCS_WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" />
                <property name="PCS_FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>
                <property name="PCS_MetFileExtension" value="met"/>
                <property name="PCS_ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
                <property name="PCS_ActionRepoFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
             </configuration>
        
             <requiredMetFields>
                <metfield name="RunID"/>
             </requiredMetFields>
        
           </task>
        

        Note: [ PGE_ROOT ] does not work here. Replace [ PGE_ROOT ] with the absolute path to pge root in tasks.xml and PGEConfig.xml

      5. Modify workflow-lifecycles.xml
        Add the following entries to this file (if not already present):

        Code Block
        titleworkflow-lifecycles.xml
          <stage name="pge_setup_build_config_file">
            <status>BUILDING CONFIG FILE</status>
          </stage>
          <stage name="pge_staging_input">
            <status>STAGING INPUT</status>
          </stage>
          <stage name="pge_exec">
            <status>PGE EXEC</status>
          </stage>
          <stage name="pcs_crawl">
            <status>CRAWLING</status>
          </stage>
        
      6. Modify workflow-instance-met.xml
        Add the following entry to this file:

        Code Block
        titleworkflow-instance-met.xml
        <workflow id="urn:oodt:FileConcatenatorWorkflow">
          <field name="RunID"/>
        </workflow>
        
      7. Restart CAS-Workflow

        No Format
        cd $WORKFLOW_HOME/bin
        ./wmgr restart
        

      ...

      1. Navigate to CAS-Workflow home binary directory

        No Format
        cd $WORKFLOW_HOME/bin
        
      2. Invoke the File Concatenator PGE by running the wmgr-client command-line

        No Format
        ./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --sendEventmetaData --eventName fileconcatenator-pge --metaData --key RunID testNumber1
        key RunID testNumber1
        

        Note, before running the ./wmgr-client command, please ensure the following servers are up and running.

        Image Added

      Note: In case you encounter errors relevant to PGEConfig.xml file path, check what value is getting set for PGETask_ConfigFilePath in http://localhost:8080/opsui -> WorkFlow Monitor -> FileConcatenator .

      If this value begins with null, make sure you have replaced [ PGE_ROOT ] with its absolute path in tasks.xml and PGEConfig.xml

      7. Verify output of PGE execution

      ...

      No Format
      INFO: Successfully ingested product: [/usr/local/pge/file_concatenator/output/jobs/job-2011-08-05T23:42:51.178Z/concatenatedOutputFile-2011-08-05T23:42:51.178Z.txt]: product id: a2d6d5ff-bfbc-11e0-8531-dff90856f73a
      

      Additionally, you should see a the below two files in the generated job directory:

      Generated product file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt

      Generated met file: $PGE_ROOT/file_concatenator/output/jobs/job-2011-08-05T23\:42\:51.178Z/concatenatedOutputFile-2011-08-05.txt.met