Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Correcting query-tool command name

...

Here are the commands to install the cas-filemgr target from a tarfile. You will need to fit in the "..." with the appropriate content.

No Format
nopaneltrue
$ mkdir -p /usr/local/oodt/
$ tar xzvf .../filemgr/target/cas-filemgr-0.4-SNAPSHOT-dist.tar.gz -C /usr/local/oodt/
$ cd /usr/local/oodt/
$ ln -s cas-filemgr-0.4-SNAPSHOT/ cas-filemgr 

The decompressed tar file creates a directory structure that looks as follows:

No Format
nopaneltrue
.
├── bin
│   ├── convert_map
│   ├── filemgr
│   ├── filemgr-client
│   ├── migrate_xml_policy
│   └── query

...

-tool
├── etc
│   ├── filemgr.properties
│   ├── logging.properties
│   └── mime-types.xml
├── lib
│   └── *.jar
├── logs
│   ├── REMOVE.log
└── policy
    ├── core
    │   ├── elements.xml
    │   ├── product-type-element-map.xml
    │   └── product-types.xml
    ├── geo
    │   ├── elements.xml
    │   ├── product-type-element-map.xml
    │   └── product-types.xml
    │
    (additional policy subdirectories)

Please note, if you are using version 0.3 of OODT or earlier, the policy directory will look like this (with no subdirectories):

No Format
nopaneltrue

...

└── policy
    ├── elements.xml
    ├── product-type-element-map.xml
    └── product-types.xml

Here is a brief description of each directory that you see listed:

...

  • filemgr : file manager (startup/shutdown) script
  • filemgr-client : file manager client interface script
  • query_-tool : catalog query tool
  • convert_map : ???
  • migrate_xml_policy : ???

...

You're now ready to run the file manager!

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr --help
Usage: ./filemgr {start|stop|restart|status}
$ ./filemgr start
Whats going to happen?

The filemgr should be up and running, however, some WARNING messages will appear, complaining about configuration.

...

Code Block
titlefilemgr.properties

org.apache.oodt.cas.filemgr.catalog.lucene.idxPath=/usr/local/oodt/cas-filemgr/catalog
org.apache.oodt.cas.filemgr.repositorymgr.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.validation.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.mime.type.repository=/usr/local/oodt/cas-filemgr/etc/mime-types.xml

...

Code Block
titleproduct-types.xml

<repository path="file:///var/archive/data"/>

...

If you're feeling curious, check out the other xml files in the /usr/local/oodt/cas-filemgr/policy subdirectories to get a better feel for how we define product types and elements. For a discussion of best practices w.r.t File Manager Policy, the reader is referred to Everything you want to know about File Manager Policy

A brief overview of filemgr-client and query

...

-tool

These commands are found in /usr/local/oodt/cas-filemgr/bin.

...

In order to trigger a file ingestion we're going to use the filemgr-client. This is by no means the most automated way to ingest data into an repository, however it's a really easy and intuitive way to trigger a file ingestion. The filemgr-client is a wrapper script, making it easier to invoke a java executable from the command line.

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr-client --help
filemgr-client --url <url to xml rpc service> --operation [<operation> [params]]
operations:
--addProductType --typeName <name> --typeDesc <description>
    --repository <path> --versionClass <classname of versioning impl>
--ingestProduct --productName <name> --productStructure <Hierarchical|Flat> 
    --productTypeName <name of product type> --metadataFile <file> 
    [--clientTransfer --dataTransfer <java class name of data transfer factory>] 
    --refs <ref1>...<refn>
--hasProduct --productName <name>
--getProductTypeByName --productTypeName <name>
--getNumProducts --productTypeName <name>
--getFirstPage --productTypeName <name>
--getNextPage --productTypeName <name> --currentPageNum <number>
--getPrevPage --productTypeName <name> --currentPageNum <number>
--getLastPage --productTypeName <name>
--getCurrentTransfer
--getCurrentTransfers
--getProductPctTransferred --productId <id> --productTypeName <name>
--getFilePctTransferred --origRef <uri>

As you can see there's a number of different ways this command can be executed.

...

However, before we take a look at the --operation --ingestProduct, I would first like to shed a bit more light on the query_-tool command.

Command: query

...

-tool

This is a very useful wrapper script to query the content of your repository.

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query

...

-tool 
Must specify a query and filemgr url! 
Usage: QueryTool [options] 
options: 
--url <fm url> 
  Lucene like query options: 
    --lucene 
         -query <query> 
  SQL like query options: 
    --sql 
         -query <query> 
         -sortBy <metadata-key> 
         -outputFormat <output-format-string> 

We see that we need to set some command line arguments to get anything useful out of the query tool. Try the next command:

$ ./query_-tool --url http://localhost:9000 --sql -query 'SELECT * FROM GenericFile'

...

Code Block
titleblah.txt.met

<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
</cas:metadata>

...

To complete the process, lets see if we can retrieve the metadata. Run the query command again:
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query_-tool --url http://localhost:9000 --sql -query 'SELECT * FROM GenericFile'

...

At the time of writing this tutorial, composing queries using query_-tool is not entirely straight forward, but entirely usable. Formatting of these queries is critical, small deviations from the syntax can result in the query return an unexpected value or throwing an exception.

...

Here is a somewhat verbose example that uses all the SQL-like syntax that I am currently aware of (apologies for all the line breaks).

No Format
nopaneltrue
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query

...

-tool --url http://localhost:9000 --sql \
-query "SELECT CAS.ProductReceivedTime,CAS.ProductName,CAS.ProductId,ProductType,\
ProductStructure,Filename,FileLocation,MimeType \
FROM GenericFile WHERE Filename='blah.txt'" -sortBy 'CAS.ProductReceivedTime' \
-outputFormat '$CAS.ProductReceivedTime,$CAS.ProductName,$CAS.ProductId,$ProductType,\
$ProductStructure,$Filename,$FileLocation,$MimeType'

The output should look like:
2011-10-07T10:59:12.031+02:00,blah.txt,a00616c6-f0c2-11e0-baf4-65c684787732,
GenericFile,Flat,blah.txt,/var/kat/archive/data/blah.txt,text/plain

...