You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Introduction

Here we will try to explain what CAS-PGE is and when a user should consider using it.  

Special Reserved Metadata Keys within CAS-PGE

SUBVERSION REPO PATH

http://svn.apache.org/repos/asf/oodt/trunk/pge/src/main/java/org/apache/oodt/cas/pge/metadata/PgeTaskMetadataKeys.java

PgeTaskMetadataKeys.java
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */


package org.apache.oodt.cas.pge.metadata;

/**
 *
 * @author bfoster
 * @version $Revision$
 *
 * <p>Describe your class here</p>.
 */
public interface PgeTaskMetadataKeys {

    public static final String NAME = "PGETask_Name";

    public static final String SCI_EXE_PATH = "PGETask_SciExe_Path";

    public static final String SCI_EXE_VERSION = "PGETask_SciExe_Version";

    public static final String PRODUCT_PATH = "PGETask_ProductPath";

    public static final String CONFIG_FILE_PATH = "PGETask_ConfigFilePath";

    public static final String LOG_FILE_PATTERN = "PGETask_LogFilePattern";

    public static final String PROPERTY_ADDER_CLASSPATH = "PGETask_PropertyAdderClasspath";

    public static final String PGE_RUNTIME = "PGETask_Runtime";

    /* PGE task statuses */
    public static final String STAGING_INPUT = "PGETask_Staging_Input";

    public static final String CONF_FILE_BUILD = "PGETask_Building_Config_File";

    public static final String RUNNING_PGE = "PGETask_Running";

    public static final String CRAWLING = "PGETask_Crawling";

}

Questions

List of questions I (cgoodale) have about the PGE module and how to use it.

  • Do I need to have workflow installed and running to use PGE?
  • What does PGE provide that workflow does not?
  • I think I just drop the cas-pge.jar into the workflow /lib dir....but what other config is needed? (xml files maybe)
  • How does workflow know that we are using pge?
  • Do i need to restart the workflow manager once pge is installed and config'd?

Simple Use Case where PGE is added to Workflow

Advanced Topics

This section will capture some of the more advanced capabilities and use cases of PGE.

FAQ

Q. How do I add multiple crawler actions to the PGE?

A. The tasks.xml file in the Workflow configuration contains a property called 'PCS_ActionsIds'. To add 1 action, then set the property like so,

<property name="PCS_ActionsIds" value="MyCrawlerActionId"/>
  • Where MyCrawlerActionId is the crawler action ID name that you'd like to run in the PGE.
  • NOTE: make sure you also have a reference to PCS_ActionRepoFile within your tasks.xml PGE entry, which points to your crawler's config file. The crawler must support the action ID you specified.
<property name="PCS_ActionRepoFile" value="file:[YOUR_OODT_HOME]/crawler/policy/crawler-config.xml" envReplace="true"/>

To add multiple crawler actions, do the following:

  • Add a property in the tasks.xml file, where the name can be whatever you want it to be and set your desired crawler actions there. We'll use ActionsIds as the property name,
<property name="ActionsIds" value="MyCrawlerActionId1,MyCrawlerActionId2"/>
  • Note that the specified crawler action IDs must be comma-separated with no spaces in-between.
    • In the PGE configuration file, add a PCS_ActionsIds key under the customMetadata tag and reference the property name that you had just set in the tasks.xml file (ActionsIds in this case),
<customMetadata>
...
   <metadata key="PCS_ActionsIds" val="[ActionsIds]"/>
...
  • No labels