Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

Introduction

Given the following scenario:

  • FileManager is running on machineA.
  • Workflow/PGE tasks should run on machineB.

There are 2 strategies to operate on remote files:

  1. Use NFS to simulate a local filesystem on machineB
  2. Use fmprod to download the product to machineB from a web URL

When your Workflow/PGE task is done, you can ingest the output files into the remote FileManager using org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory as the clientTransferer.

Anchor
nfs
nfs

Use NFS

The idea here is to use NFS to mount the data archive onto a common root file path. Then all file paths will appear to be on the local file system.

  1. Edit /etc/exports to allow machineB to see the data archive on machineA (for more information, type "man exports")

    Code Block
    titlemachineA: /etc/exports
    /Users/me/filemgr/data/archive -network=10.0.0.0 -mask=255.255.255.0
    
  2. Restart the NFS service on machineA

    No Format
    sudo nfsd restart
    
  3. Mount the remote file system on machineB (for more information, type "man mount")

    No Format
    sudo mkdir -p /net/machineA
    sudo mount -t nfs machineA:/Users/me/filemgr/data/archive /net/machineA
    
  4. Create a symbolic link on machineA

    No Format
    sudo ln -s /Users/me/filemgr/data/archive /net/machineA
    
  5. Edit product-types.xml to use the symbolic link

    Code Block
    titlemachineA: product-types.xml
    ...
    <type id="urn:MyProdTypeId" name="MyProdTypeName">
      <repository path="file:///net/machineA"/>
    ...
    
  6. The FileManager should now return file paths that are reachable by machineB.

If your products were already previously ingested, you must update the [FileLocation] metadata element for your ingested products. There are several ways to update the catalog with the new NFS file location:

  • Re-ingest, making sure the repository path in product-types.xml uses the NFS mount.
  • Re-ingest, using a different Versioner, which gives the NFS mount as final file location.
  • MetadataBasedProductMover. Run the following locally on machineA:

    No Format
    java -Djava.ext.dirs=../lib org.apache.oodt.cas.filemgr.tools.MetadataBasedProductMover \
        --fileManagerUrl http://localhost:9000 \
        --typeName MyProdTypeName \
        --pathSpec /net/machineA/[Filename]
    

Anchor
fmprod
fmprod

Use fmprod

The idea here is to deploy a Product Server to allow anyone to download products with an HTTP request.

  1. Build fmprod (assumes you have OODT sources checked out)

    No Format
    cd webapp/fmprod
    mvn clean package
    
  2. Deploy fmprod (cas-product-VERSION.war)
  3. Download the product with an HTTP request (GET or POST)
    • To get a file product, use the "productID" query parameter.

      No Format
      curl 'http://webappMachine/fmprod/data?productID=< your product id >'
      
    • To get a hierarchical product: use additional 'refIndex' parameter or format parameter. Note 'refIndex=0' is the directory name.

      No Format
      curl 'http://webappMachine/fmprod/data?productID=< your product id >&refIndex=1'
      curl 'http://webappMachine/fmprod/data?productID=< your product id >&format=application/x-zip'
    • To get a dataset (products of a certain type) as a zip file, use the "typeID" query parameter.

      No Format
      curl 'http://webappMachine/fmprod/dataset?typeID=< your product type id >'
      
    • The filename is sent to to requesting client in the server header. It's possible to force wget and curl to use this information to name the file. To download a product and to figure out the product name from the server header information use one of the following commands:

      No Format
      curl --remote-name --remote-header-name http://webappMachine/fmprod/data?productID=< your product id >
      wget --content-disposition http://webappMachine/fmprod/data?productID=< your product id >
      
  4. The previous download step can be wrapped in a "FileStager" task. Subsequent tasks can now operate on the downloaded local file.
Note

The disadvantage of this approach is that products and metadata must be retrieved separately, since products are downloaded via the fmprod web service and metadata are obtained via direct connection to FileManager. Also, you cannot use PGE sql-like queries to obtain products.