Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Tip

The following Sub-Sections will reference the path to each file that needs to be edited, and each file will be followed by a block showing what changes need to be made

...

[CAS_PP_HOME

...

]/etc/push_pull_framework.properties
Code Block
line
21   #external configuration files
22   org.apache.oodt.cas.pushpull.config.external.properties.files=[CAS_PP_HOME]/etc/default.properties

35   # ingester filemgr url
36   org.apache.oodt.cas.filemgr.url=

61   #protocolfactory specification for protocol types
62   org.apache.oodt.cas.pushpull.config.protocolfactory.info.files=[CAS_PP_HOME]/policy/ProtocolFactoryInfo.xml

69   #parser to retrievalmethod map
70   org.apache.oodt.cas.pushpull.config.parser.info.files=[CAS_PP_HOME]/policy/ParserToRetrievalMethodMap.xml
71
72   #unique metadata element info
73   org.apache.oodt.cas.pushpull.config.type.detection.file=[CAS_PP_HOME]/policy/mimetypes.xml
74
75   #directory below which all data file will be downloaded to
76   org.apache.oodt.cas.pushpull.data.files.base.staging.area=[CAS_PP_HOME]/staging

...

Code Block
ftp://l4ftl01.larc.nasa.gov/TES/TL2CO2N.005/2004.09.20/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5
ftp://l4ftl01.larc.nasa.gov/TES/TL2CO2N.005/2005.05.21/TES-Aura_L2-CO2-Nadir_r0000002931_F06_08.he5

...

...

[CAS_PP_HOME

...

]/policy/mimetypes.xml

Within the mimetypes.xml file we need to map a filename pattern (regex or not) to a custom mimetype. Below we have 3 mimetypes, the first 2 are default in pushpull the 3rd is a custom one based on the filenaming of our desired HDF-5 remote files.

Code Block
xml
xml
<mime-info>
    <mime-type type="metadata/cas_pushpull">
        <glob pattern="*.info.tmp"/>
    </mime-type>
    <mime-type type="metadata/cas_metadata">
        <glob pattern="*.cas"/>
        <glob pattern="*.met"/>
    </mime-type>
    <mime-type type="product/TESLevel2CO2">
        <_comment>Level 2 - CO2 Retrivals from TES</_comment>
        <glob pattern="TES-Aura_L2-CO2-Nadir_r\d{10}\w{2}\d{2}\w\d{2}\.he5" isregex="true"/>
    </mime-type>
</mime-info>

...

...

[CAS_PP_HOME

...

]/etc/examples/ExternalSourcesFiles/ExternalSources.xml

Purpose: This file contains a list of External Data Sources such as FTP Servers. The login.alias attribute will be used within the RemoteSpecs.xml file. This file is located in the etc/examples folder and contains several great examples that you can tailor to your application. I have removed all un-used ExternalSources to make sure I don't go download files I don't want. The source.host doesn't contain the URI prefix (ftp://, http://) and there is NO trailing slash. The login.type takes care of the prefix.

Code Block
xml
xml
<sources>
    <source host="l4ftl01.larc.nasa.gov">
        <login type="ftp" alias="TESL2CO2">
            <username>anonymous</username>
            <password>user@host.com</password>
        </login>
    </source>
</sources>

...

...

[CAS_PP_HOME

...

]/etc/examples/RemoteSpecsFiles/RemoteSpecs.xml

Purpose: This file will first reference the aliases listed in the ExternalSources.xml file from the previous section. Then you can define one or more daemons. The daemon.alias must be listed in the ExternalSources.xml so the daemon will know where it should look for files. The propInfo and propFiles tell the daemon exactly what directories and files to retrieve. We will need to create an xml file called TESL2CO2.xml and place it in the propInfo.dir location. For simplicity I have kept the alias, propFiles and staging area the same (TESL2CO2).  The period attribute on the runInfo tag is used to set the sleep/wait time for the daemon.  Default in 3 minutes, but you may want to adjust this later in production.

Code Block
xml
xml
<remoteSpecs>
    <aliasSpecs>
        <aliasSpec file="[CAS_PP_HOME]/etc/examples/ExternalSources/ExternalSources.xml"/>
    </aliasSpecs>

    <daemons>
        <daemon alias="TESL2CO2" active="yes">
            <runInfo firstRunDateTime="2011-12-01T00:00:00Z" period="3m" runOnReboot="yes"/>
            <propInfo dir="[CAS_PP_HOME]/etc/examples/DirStructXmlParserFiles">
                <propFiles regExp="TESL2CO2\.xml" parser="org.apache.oodt.cas.pushpull.filerestrictions.parsers.DirStructXmlParser"/>
            </propInfo>
            <dataInfo stagingArea="TESL2CO2" deleteFromServer="no"/>
        </daemon>
    </daemons>
</remoteSpecs>

...

...

[CAS_PP_HOME

...

]/etc/examples/DirStructXmlParserFiles/TESL2CO2.xml

Purpose: This file tells pushpull how to parse the remote directory structure. In this example the starting_path is static for all of our remote file paths, but then we have dynamic folders that correspond to a YYYY.MM.DD format so we have a simple regex to pushpull will dig down into each subfolder and will pull out the filename we have declared with another regex.
Within the examples/DirStructXmlParserFiles there are several different examples to learn from.

...