Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed alignment issues

Table of Contents
minLevel3

The File Manager

This self guided tutorial is intended for first time users.

The fact that you've found this page, I assume that you are seriously thinking of using the OODT File Manager but are eager to get something up and running. It hopefully also means that you've checked out the code and built a cas-filemgr install target (e.g. a cas-filemgr-0.4-SNAPSHOT${version}-dist.tar.gz file).

This tutorial is by no means a complete overview of all the File Managers functionality. However, it's an attempt to get you started using the basic tools. Like learning to drive a car, the most difficult part is getting it started and on the road!

...

Here are the commands to install the cas-filemgr target from a tarfile. You will need to fit in the "..." with the appropriate content.

No Format
nopaneltrue
$ mkdir -p /usr/local/oodt/
$ tar xzvf .../filemgr/target/cas-filemgr-

...

${version}-dist.tar.gz -C /usr/local/oodt/
$ cd /usr/local/oodt/
$ ln -s cas-filemgr-

...

${version}/ cas-filemgr 

The decompressed tar file creates a directory structure that looks as follows:

No Format
nopaneltrue
.
├── bin

...


│   ├── filemgr
│   ├── filemgr-client

...


│   └── query

...

-tool
├── etc
│   ├── filemgr

...

.properties
│   └── mime-types.xml
├── lib
│   └── *.jar
├── logs

...

└── policy
|   ├── cmd-line-actions.xml
|   ├── cmd-line-options.xml
|   ├── core
|   │   ├── elements.xml
|   │   ├── product-type-element-map.xml
|   │   └── product-types.xml
|   |
|   ├── trace 
|   |   ├── elements.xml
|   

...

|   ├── product-type-element-map.xml 
|   

...

|   └── product-types.xml
|   |
|   ├── geo
|   

...

|   ├── elements.xml
|   

...

|   ├── product-type-element-map.xml
|   

...

|   └── product-types.xml
|   

...

|
|    (additional policy sub 

...

directories)
└── run

Please note, if you are using version 0.3 of OODT or earlier, the policy directory will look like this (with no subdirectoriessub directories):

No Format
nopaneltrue

...

└── policy
    ├── elements.xml
    ├── product-type-element-map.xml
    └── product-types.xml

Here is a brief description of each directory that you see listed:

...

  • filemgr : file manager (startup/shutdown) script
  • filemgr-client : file manager client interface script
  • query_-tool : catalog query tool
  • convert_map : ???
  • migrate_xml_policy : ???tool

Configuring and Running the File Manager

You're now ready to run the file manager!

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr --help
Usage: ./filemgr {start|stop

...

|status}
$ ./filemgr start
Whats going to happen?

The filemgr should be up and running, however, some WARNING messages will may appear, complaining about configuration.

If you get a java.net.BindException exception, make sure that no other service is running on port 9000. This is the port for an xml-rpc RPC interface that will be used for transferring data files into a repository.

There's also a new directory - file in the /usr/local/oodt/run directory. It The file contains a file with the filemgr process id. This is typical for *nix service house keeping. It is done to try and avoid running multiple filemgr services.

...

Code Block
titlefilemgr.properties

org.apache.oodt.cas.filemgr.catalog.lucene.idxPath=/usr/local/oodt/cas-filemgr/catalog
org.apache.oodt.cas.filemgr.repositorymgr.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.validation.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.mime.type.repository=/usr/local/oodt/cas-filemgr/etc/mime-types.xml

...

Code Block
titleproduct-types.xml

<repository path="file:///var/archive/data"/>

...

Server side metadata is generated by using java classes and the extractors that will be used are configured in the product-types.xml file in the chosen policy directory. For this example configuration, you should have /usr/local/oodt/cas-filemgr/policy/coreoodt as the policy directory, unless you're running version 0.3 or earlier of OODT, in which case you should have /usr/local/oodt/cas-filemgr/policy as the policy directory.

...

For the GenericFile type find the <metExtractors> key. It's specifying some extractors to use for server side metadata extraction, namely: CoreMetExtractor, MimeTypeExtractor, FinalFileLocationExtractor. For more details about metadata and extractors see the following page: http://oodt.apache.org/components/maven/metadata/user/basic.htmlImage Removed Metadata Extractors.

If you're feeling curious, check out the other xml files in the /usr/local/oodt/cas-filemgr/policy subdirectories to get a better feel for how we define product types and elements. For a discussion of best practices w.r.t File Manager Policy, the reader is referred to Everything you want to know about File Manager Policy

A brief overview of filemgr-client and query

...

-tool

These commands are found in /usr/local/oodt/cas-filemgr/bin.

...

In order to trigger a file ingestion we're going to use the filemgr-client. This is by no means the most automated way to ingest data into an repository, however it's a really easy and intuitive way to trigger a file ingestion. The filemgr-client is a wrapper script, making it easier to invoke a java executable from the command line.

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr-client --help
filemgr-client --url <url to xml rpc service> --operation [<operation> [params]]
operations:
--addProductType --typeName <name> --typeDesc <description>
    --repository <path> --versionClass <classname of versioning impl>
--ingestProduct --productName <name> --productStructure <Hierarchical|Flat> 
    --productTypeName <name of product type> --metadataFile <file> 
    [--clientTransfer --dataTransfer <java class name of data transfer factory>] 
    --refs <ref1>...<refn>
--hasProduct --productName <name>
--getProductTypeByName --productTypeName <name>
--getNumProducts --productTypeName <name>
--getFirstPage --productTypeName <name>
--getNextPage --productTypeName <name> --currentPageNum <number>
--getPrevPage --productTypeName <name> --currentPageNum <number>
--getLastPage --productTypeName <name>
--getCurrentTransfer
--getCurrentTransfers
--getProductPctTransferred --productId <id> --productTypeName <name>
--getFilePctTransferred --origRef <uri>

As you can see there's a number of different ways this command can be executed.

The first command line argument is --url. This is the location of the filemgr xml-rpc data transfer interface. Looking at the filemgr logs (specifically cas_filemgr0.log), we see an INFO statement telling us that local data transfer is enable on http://localhost:9000Image Removed. This is the url that we need to specify.

...

However, before we take a look at the --operation --ingestProduct, I would first like to shed a bit more light on the query_-tool command.

Command: query

...

-tool

This is a very useful wrapper script to query the content of your repository.

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query

...

-tool 
Must specify a query and filemgr url! 
Usage: QueryTool [options] 
options: 
--url <fm url> 
  Lucene like query options: 
    --lucene 
         -query <query> 
  SQL like query options: 
    --sql 
         -query <query> 
         -sortBy <metadata-key> 
         -outputFormat <output-format-string> 

We see that we need to set some command line arguments to get anything useful out of the query tool. Try the next command:

$ ./query_-tool --url http://localhost:9000Image Removed --sql -query 'SELECT * FROM GenericFile'

...

Code Block
titleblah.txt.met

<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
</cas:metadata>

...

  • --productName : The name you want for your ingested product
  • --productStructure : Flat file or directory (i.e. hereticalhierarchical). Yes... we can ingest whole directories as one product
  • --productTypeName : A product type (as per product-types.xml)
  • --metadataFile : The client side metadata file
  • --refs : The product location

...

There's also an optional argument {{\-\-clientTransfer}}, however, we're going to leave this and use the default local transfer. {{\
[\-\-clientTransfer --dataTransfer <java class name of data transfer factory>\]}}

Here is the complete command:
$ ./filemgr-client --url http://localhost:9000Image Removed --operation --ingestProduct --productName blah.txt --productStructure Flat --productTypeName GenericFile --metadataFile file:///tmp/blah.txt.metImage Removed --refs file:///tmp/blah.txtImage Removed

The output should look like:
Sep 16, 2011 2:09:42 PM org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient <init>
...
...
ingestProduct: Result: c2fbf4b9-e05c-11e0-9022-77a707615e7f

...

To complete the process, lets see if we can retrieve the metadata. Run the query command again:
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query_-tool --url http://localhost:9000Image Removed --sql -query 'SELECT * FROM GenericFile'

...

At the time of writing this tutorial, composing queries using query_-tool is not entirely straight forward, but entirely usable. Formatting of these queries is critical, small deviations from the syntax can result in the query return an unexpected value or throwing an exception.

...

Here is a somewhat verbose example that uses all the SQL-like syntax that I am currently aware of (apologies for all the line breaks).

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query

...

-tool --url http://localhost:9000 --sql \
-query "SELECT CAS.ProductReceivedTime,CAS.ProductName,CAS.ProductId,ProductType,\
ProductStructure,Filename,FileLocation,MimeType \
FROM GenericFile WHERE Filename='blah.txt'" -sortBy 'CAS.ProductReceivedTime' \
-outputFormat '$CAS.ProductReceivedTime,$CAS.ProductName,$CAS.ProductId,$ProductType,\
$ProductStructure,$Filename,$FileLocation,$MimeType'

The output should look like:
2011-10-07T10:59:12.031+02:00,blah.txt,a00616c6-f0c2-11e0-baf4-65c684787732,
GenericFile,Flat,blah.txt,/var/kat/archive/data/blah.txt,text/plain

Now you can also check out some of the other 12 --operation possibilities for filemgr-client. For instance:

$ ./filemgr-client --url http://localhost:9000Image Removed --operation --hasProduct --productName blah.txt

Or:

$ ./filemgr-client --url http://localhost:9000Image Removed --operation --getFirstPage --productTypeName GenericFile

...

Cameron Goodale has written some useful command line tools aliases that are worth mentioning before we continue. See the following two web pages: https://issues.apache.org/jira/browse/OODT-306Image Removed
BASH and TCSH shell tools for File Manager

...