1. Defining a Workflow with CAS-PGE

See CAS-PGE Learn by Example

2. Adding Metadata

Workflow Context Metadata

Every workflow instance has the following core metadata keys:

-TaskId
-WorkflowInstId
-JobId
-ProcessingNode
-WorkflowManagerUrl
-QueueName
-TaskLoad

The above met keys can be accessed inside a PGE script. For e.g:

<customMetadata>
    <metadata key="JobWorkDir" val="[PGE_WORK_DIR]/[JobId]"/>
</customMetadata>

In addition to the above keys, you can add metadata to the workflow using the --metaData option while kicking off a workflow event.

The --metaData command line option adds key-value pairs to the Workflow context metadata as seen below:

./wmgr-client --url http://localhost:9001 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1

The key can be used inside a PGE, such as in the augmented metadata i.e <customMetadata> , like below:

..
<customMetadata>

<metadata key="InputFile" val= "SQL(FORMAT='$Filename') {SELECT Filename FROM GenericFile WHERE RID = '[RunID]' }" />

</customMetadata>
Augmenting Metadata in a PGE

This part is taken from the 'CAS-Workflow 2:A User Guide' by Brian Foster

The element for augmenting metadata is <customMetadata>. Although this element is at the end of the file, it doesn’t mean that it is the last to be loaded. <customMetadata> is actually the first element loaded in this pge-config.xml (the only other element that is loaded before it is the import element – not in this example). Inside <customMetadata> any number of <metadata> elements are allowed.

To pass metadata through all tasks in a workflow, you can specify the attribute workflowMet='true'. For example: <metadata key='filename' val='data.dat' workflowMet='true'/>

Metadata elements specified in a different file can be accessed in a PGE using the <import> tag. For example if common-metadata.xml contains the below:

common-metadata.xml
<pgeConfig>
  <customMetadata>
     <metadata key="JobWorkDir" val="[PGE_WORK_DIR]/[JobId]"/>
     <metadata key="JavaHome" val="/usr/bin/java"/>
     <metadata key="RespJar" val="[WORKFLOW_HOME]/lib/somejarfile-0.0.jar"/>
  </customMetadata>
  <!--Add similar common metadata keys-->
</pgeConfig>

The above file can be imported into the PGE task configs as shown below PgeConfig example:

<pgeConfig>

  <import file="common-metadata.xml"/>

  <exe dir="[JobWorkDir]" shellType="/bin/bash">
    <cmd> [JavaHome] -cp [RespJar] [LoadClass] [Arguments]
    </cmd>
  </exe>
  <output>
    <dir path="[OutputDir]" createBeforeExe="true">
    </dir>
  </output>

  <customMetadata>
    <metadata key="LoadClass" val="edu.usc.chla.vpicu.vpsdb.SomeClass"/>
    <metadata key="Arguments" val="blah1 blah2"/>
  </customMetadata>
</pgeConfig>
Product-Type Metadata

The product-type metadata refers to the metadata for the files that are ingested during the workflow.This is defined in a met file that is specified in the "args" attribute of the 'files' element in the PgeConfig.xml :

<files name="FiletoIngest" metFileWriterClass="org.apache.oodt.cas.pge.writers.metlist.MetadataListPcsMetFileWriter" args="PGE_CONFIG_HOME/MetOut_FiletoIngest.xml"/>

The MetOut_FiletoIngest.xml should typically look like the below:

<?xml version="1.0" encoding="UTF-8"?>
  <metadataList>
      <!-- Any File -->
      <metadata key="ProductName" val="[Filename]"/>
      <metadata key="Filename"/>
      <metadata key="FileLocation"/>
      <metadata key="FileSize"/>
      <metadata key="ProductType"/>
      <!--Add any element specified in your elements.xml that you want to be written out 
          as metadata for the output file-->
  </metadataList>
<?xml version="1.0" encoding="UTF-8"?>

The metFileWriters create the metadata (.met) file for the output files that will be ingested by the file manager. 

1 Comment

  1. Sheryl this is great. Keep up the good work!