Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

[CAS_PP_HOME]/etc/push_pull_framework.properties
Code Block

line
21   #external configuration files
22   org.apache.oodt.cas.pushpull.config.external.properties.files=[CAS_PP_HOME]/etc/default.properties

35   # ingester filemgr url
36   org.apache.oodt.cas.filemgr.url=

61   #protocolfactory specification for protocol types
62   org.apache.oodt.cas.pushpull.config.protocolfactory.info.files=[CAS_PP_HOME]/policy/ProtocolFactoryInfo.xml

69   #parser to retrievalmethod map
70   org.apache.oodt.cas.pushpull.config.parser.info.files=[CAS_PP_HOME]/policy/ParserToRetrievalMethodMap.xml
71
72   #unique metadata element info
73   org.apache.oodt.cas.pushpull.config.type.detection.file=[CAS_PP_HOME]/policy/mimetypes.xml
74
75   #directory below which all data file will be downloaded to
76   org.apache.oodt.cas.pushpull.data.files.base.staging.area=[CAS_PP_HOME]/staging

...

Examples of full path:

Code Block

ftp://l4ftl01.larc.nasa.gov/TES/TL2CO2N.005/2004.09.20/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5
ftp://l4ftl01.larc.nasa.gov/TES/TL2CO2N.005/2005.05.21/TES-Aura_L2-CO2-Nadir_r0000002931_F06_08.he5

...

Within the mimetypes.xml file we need to map a filename pattern (regex or not) to a custom mimetype. Below we have 3 mimetypes, the first 2 are default in pushpull the 3rd is a custom one based on the filenaming of our desired HDF-5 remote files.

Code Block
xml
xml

<mime-info>
    <mime-type type="metadata/cas_pushpull">
        <glob pattern="*.info.tmp"/>
    </mime-type>
    <mime-type type="metadata/cas_metadata">
        <glob pattern="*.cas"/>
        <glob pattern="*.met"/>
    </mime-type>
    <mime-type type="product/TESLevel2CO2">
        <_comment>Level 2 - CO2 Retrivals from TES</_comment>
        <glob pattern="TES-Aura_L2-CO2-Nadir_r\d{10}\w{2}\d{2}\w\d{2}\.he5" isregex="true"/>
    </mime-type>
</mime-info>

...

Purpose: This file contains a list of External Data Sources such as FTP Servers. The login.alias attribute will be used within the RemoteSpecs.xml file. This file is located in the etc/examples folder and contains several great examples that you can tailor to your application. I have removed all un-used ExternalSources to make sure I don't go download files I don't want. The source.host doesn't contain the URI prefix (ftp://, http://) and there is NO trailing slash. The login.type takes care of the prefix.

Code Block
xml
xml

<sources>
    <source host="l4ftl01.larc.nasa.gov">
        <login type="ftp" alias="TESL2CO2">
            <username>anonymous</username>
            <password>user@host.com</password>
        </login>
    </source>
</sources>

...

Purpose: This file will first reference the aliases listed in the ExternalSources.xml file from the previous section. Then you can define one or more daemons. The daemon.alias must be listed in the ExternalSources.xml so the daemon will know where it should look for files. The propInfo and propFiles tell the daemon exactly what directories and files to retrieve. We will need to create an xml file called TESL2CO2.xml and place it in the propInfo.dir location. For simplicity I have kept the alias, propFiles and staging area the same (TESL2CO2).  The period attribute on the runInfo tag is used to set the sleep/wait time for the daemon.  Default in 3 minutes, but you may want to adjust this later in production.

Code Block
xml
xml

<remoteSpecs>
    <aliasSpecs>
        <aliasSpec file="[CAS_PP_HOME]/etc/examples/ExternalSources/ExternalSources.xml"/>
    </aliasSpecs>

    <daemons>
        <daemon alias="TESL2CO2" active="yes">
            <runInfo firstRunDateTime="2011-12-01T00:00:00Z" period="3m" runOnReboot="yes"/>
            <propInfo dir="[CAS_PP_HOME]/etc/examples/DirStructXmlParserFiles">
                <propFiles regExp="TESL2CO2\.xml" parser="org.apache.oodt.cas.pushpull.filerestrictions.parsers.DirStructXmlParser"/>
            </propInfo>
            <dataInfo stagingArea="TESL2CO2" deleteFromServer="no"/>
        </daemon>
    </daemons>
</remoteSpecs>

...

Purpose: This file tells pushpull how to parse the remote directory structure. In this example the starting_path is static for all of our remote file paths, but then we have dynamic folders that correspond to a YYYY.MM.DD format so we have a simple regex to pushpull will dig down into each subfolder and will pull out the filename we have declared with another regex.
Within the examples/DirStructXmlParserFiles there are several different examples to learn from.

Code Block
xml
xml

<root>
    <dirstruct starting_path="/TES/TL2CO2N.005">
        <nofiles/>
        <dir name="\d{4}\.\d{2}\.\d{2}"> <!-- regex matching '2004.09.20' -->
            <nodirs/>
            <!-- regex matching TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5 -->
            <file name="TES-Aura_L2-CO2-Nadir_r\d{10}\w{2}\d{2}\w\d{2}\.he5"/>
        </dir>
    </dirstruct>
</root>

...

  1. cd $CAS_PP_HOME/bin
  2. The two options listed below:
    1. Export 2 env vars
    2. Replace the CAS_PP_RESOURCES and DAEMONLAUNCHER_PORT with static values

      Code Block
      title[CAS_PP_HOME]/bin/pushpull
      borderStyledashed
      
      line
      25   ${JAVA_HOME}/bin/java \
      26   -cp ${LIB_DEPS} -Dcom.sun.management.jmxremote \
      27   -Djava.util.logging.config.file=../etc/logging.properties \
      28   -Djavax.net.ssl.trustStore=${CAS_PP_RESOURCES}/jssecacerts \
      29   org.apache.oodt.cas.pushpull.daemon.DaemonLauncher \
      30   --rmiRegistryPort ${DAEMONLAUNCHER_PORT} \
      31   --propertiesFile ${CAS_PP_RESOURCES}/push_pull_framework.properties \
      32   --remoteSpecsFile ${CAS_PP_RESOURCES}/examples/RemoteSpecsFiles/RemoteSpecs.xml
      
      # You can leave this file unchanged by merely exporting the following env vars (bash shell)
      
      export CAS_PP_RESOURCES=$CAS_PP_HOME/etc
      export DAEMONLAUNCHER_PORT=9012
      
      # Or you can always use this config and not setup env vars
      line
      25   ${JAVA_HOME}/bin/java \
      26   -cp ${LIB_DEPS} -Dcom.sun.management.jmxremote \
      27   -Djava.util.logging.config.file=${CAS_PP_HOME}/etc/logging.properties \
      28   -Djavax.net.ssl.trustStore=${CAS_PP_HOME}/etc/jssecacerts \
      29   org.apache.oodt.cas.pushpull.daemon.DaemonLauncher \
      30   --rmiRegistryPort 9012 \
      31   --propertiesFile ${CAS_PP_HOME}/etc/push_pull_framework.properties \
      32   --remoteSpecsFile ${CAS_PP_HOME}/etc/examples/RemoteSpecsFiles/RemoteSpecs.xml
      
  3. ./pushpull

...

Code Block
title[CAS_PP_HOME]/policy/RemoteSpecs/RemoteSpecs.xml
borderStyledashed
<dataInfo stagingArea="MOD09GA-NRT" deleteFromServer="no" queryElement="Filename"/>
</daemon>

No data file is downloaded to my staging directory after running the ./pushpull script. What should I do? 

1. Make sure there are indeed some qualified data files in the remote ftp servers.  

2. This may be caused by the protocol issues of the PushPull ftp plugins. So please try the other PushPull ftp plugins. For the details please refer to OODT Push Pull Plugins.