Table of Contents | ||
---|---|---|
|
The File Manager
This self guided tutorial is intended for first time users.
The fact that you've found this page, I assume that you are seriously thinking of using the OODT File Manager but are eager to get something up and running. It hopefully also means that you've checked out the code and built a cas-filemgr install target (e.g. a cas-filemgr-0.4-SNAPSHOT${version}-dist.tar.gz
file).
This tutorial is by no means a complete overview of all the File Managers functionality. However, it's an attempt to get you started using the basic tools. Like learning to drive a car, the most difficult part is getting it started and on the road!
...
Here are the commands to install the cas-filemgr target from a tarfile. You will need to fit in the "..." with the appropriate content.
No Format | ||
---|---|---|
| ||
$ mkdir -p /usr/local/oodt/
$ tar xzvf .../filemgr/target/cas-filemgr- |
...
${version}-dist.tar.gz -C /usr/local/oodt/ $ cd /usr/local/oodt/ $ ln -s cas-filemgr- |
...
${version}/ cas-filemgr |
The decompressed tar file creates a directory structure that looks as follows:
No Format | ||
---|---|---|
| ||
.
├── bin |
...
│ ├── filemgr
│ ├── filemgr-client |
...
│ └── query |
...
-tool ├── etc │ ├── filemgr |
...
.properties
│ └── mime-types.xml
├── lib
│ └── *.jar
├── logs
|
...
└── policy | ├── cmd-line-actions.xml | ├── cmd-line-options.xml | ├── core | │ ├── elements.xml | │ ├── product-type-element-map.xml | │ └── product-types.xml | | | ├── trace | | ├── elements.xml | |
...
| ├── product-type-element-map.xml | |
...
| └── product-types.xml | | | ├── geo | |
...
| ├── elements.xml | |
...
| ├── product-type-element-map.xml | |
...
| └── product-types.xml | |
...
| | (additional policy sub |
...
directories) └── run |
Please note, if you are using version 0.3 of OODT or earlier, the policy directory will look like this (with no subdirectoriessub directories):
No Format | ||
---|---|---|
|
...
└── policy
├── elements.xml
├── product-type-element-map.xml
└── product-types.xml
|
Here is a brief description of each directory that you see listed:
...
filemgr
: file manager (startup/shutdown) scriptfilemgr-client
: file manager client interface scriptquery_-tool
: catalog query toolconvert_map
: ???migrate_xml_policy
: ???tool
Configuring and Running the File Manager
You're now ready to run the file manager!
No Format | ||
---|---|---|
| ||
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr --help
Usage: ./filemgr {start|stop |
...
|status}
$ ./filemgr start |
Whats going to happen?
The filemgr should be up and running, however, some WARNING messages will may appear, complaining about configuration.
If you get a java.net.BindException exception, make sure that no other service is running on port 9000. This is the port for an xml-rpc RPC interface that will be used for transferring data files into a repository.
There's also a new directory - file in the /usr/local/oodt/run directory. It The file contains a file with the filemgr process id. This is typical for *nix service house keeping. It is done to try and avoid running multiple filemgr services.
...
Code Block | ||
---|---|---|
| ||
org.apache.oodt.cas.filemgr.catalog.lucene.idxPath=/usr/local/oodt/cas-filemgr/catalog
org.apache.oodt.cas.filemgr.repositorymgr.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.validation.dirs=file:///usr/local/oodt/cas-filemgr/policy/core
org.apache.oodt.cas.filemgr.mime.type.repository=/usr/local/oodt/cas-filemgr/etc/mime-types.xml
|
...
Code Block | ||
---|---|---|
| ||
<repository path="file:///var/archive/data"/>
|
...
Server side metadata is generated by using java classes and the extractors that will be used are configured in the product-types.xml file in the chosen policy directory. For this example configuration, you should have /usr/local/oodt/cas-filemgr/policy/coreoodt
as the policy directory, unless you're running version 0.3 or earlier of OODT, in which case you should have /usr/local/oodt/cas-filemgr/policy
as the policy directory.
...
For the GenericFile type find the <metExtractors>
key. It's specifying some extractors to use for server side metadata extraction, namely: CoreMetExtractor, MimeTypeExtractor, FinalFileLocationExtractor. For more details about metadata and extractors see the following page: http://oodt.apache.org/components/maven/metadata/user/basic.html Metadata Extractors.
If you're feeling curious, check out the other xml files in the /usr/local/oodt/cas-filemgr/policy
subdirectories to get a better feel for how we define product types and elements. For a discussion of best practices w.r.t File Manager Policy, the reader is referred to Everything you want to know about File Manager Policy
A brief overview of filemgr-client and query
...
-tool
These commands are found in /usr/local/oodt/cas-filemgr/bin
.
...
In order to trigger a file ingestion we're going to use the filemgr-client
. This is by no means the most automated way to ingest data into an repository, however it's a really easy and intuitive way to trigger a file ingestion. The filemgr-client
is a wrapper script, making it easier to invoke a java executable from the command line.
No Format | ||
---|---|---|
| ||
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr-client --help
filemgr-client --url <url to xml rpc service> --operation [<operation> [params]]
operations:
--addProductType --typeName <name> --typeDesc <description>
--repository <path> --versionClass <classname of versioning impl>
--ingestProduct --productName <name> --productStructure <Hierarchical|Flat>
--productTypeName <name of product type> --metadataFile <file>
[--clientTransfer --dataTransfer <java class name of data transfer factory>]
--refs <ref1>...<refn>
--hasProduct --productName <name>
--getProductTypeByName --productTypeName <name>
--getNumProducts --productTypeName <name>
--getFirstPage --productTypeName <name>
--getNextPage --productTypeName <name> --currentPageNum <number>
--getPrevPage --productTypeName <name> --currentPageNum <number>
--getLastPage --productTypeName <name>
--getCurrentTransfer
--getCurrentTransfers
--getProductPctTransferred --productId <id> --productTypeName <name>
--getFilePctTransferred --origRef <uri>
|
As you can see there's a number of different ways this command can be executed.
The first command line argument is --url
. This is the location of the filemgr xml-rpc data transfer interface. Looking at the filemgr logs (specifically cas_filemgr0.log), we see an INFO statement telling us that local data transfer is enable on http://localhost:9000. This is the url that we need to specify.
...
However, before we take a look at the --operation --ingestProduct
, I would first like to shed a bit more light on the query_-tool
command.
Command: query
...
-tool
This is a very useful wrapper script to query the content of your repository.
No Format | ||
---|---|---|
| ||
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query |
...
-tool Must specify a query and filemgr url! Usage: QueryTool [options] options: --url <fm url> Lucene like query options: --lucene -query <query> SQL like query options: --sql -query <query> -sortBy <metadata-key> -outputFormat <output-format-string> |
We see that we need to set some command line arguments to get anything useful out of the query tool. Try the next command:
$ ./query_-tool --url http://localhost:9000
--sql -query 'SELECT * FROM GenericFile'
...
Code Block | ||
---|---|---|
| ||
<cas:metadata xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
</cas:metadata>
|
...
--productName
: The name you want for your ingested product--productStructure
: Flat file or directory (i.e. hereticalhierarchical). Yes... we can ingest whole directories as one product--productTypeName
: A product type (as per product-types.xml)--metadataFile
: The client side metadata file--refs
: The product location
...
There's also an optional argument {{\-
\-clientTransfer
}}, however, we're going to leave this and use the default local transfer.
{{\[
\-
\-clientTransfer
--dataTransfer
<java
class
name
of
data
transfer
factory>
\]
}}
Here is the complete command:$ ./filemgr-client --url http://localhost:9000
--operation --ingestProduct --productName blah.txt --productStructure Flat --productTypeName GenericFile --metadataFile file:///tmp/blah.txt.met
--refs file:///tmp/blah.txt
The output should look like:Sep 16, 2011 2:09:42 PM org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient <init>
...
...
ingestProduct: Result: c2fbf4b9-e05c-11e0-9022-77a707615e7f
...
To complete the process, lets see if we can retrieve the metadata. Run the query command again:$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query_-tool --url http://localhost:9000
--sql -query 'SELECT * FROM GenericFile'
...
At the time of writing this tutorial, composing queries using query_-tool is not entirely straight forward, but entirely usable. Formatting of these queries is critical, small deviations from the syntax can result in the query return an unexpected value or throwing an exception.
...
Here is a somewhat verbose example that uses all the SQL-like syntax that I am currently aware of (apologies for all the line breaks).
No Format | ||
---|---|---|
| ||
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query |
...
-tool --url http://localhost:9000 --sql \ -query "SELECT CAS.ProductReceivedTime,CAS.ProductName,CAS.ProductId,ProductType,\ ProductStructure,Filename,FileLocation,MimeType \ FROM GenericFile WHERE Filename='blah.txt'" -sortBy 'CAS.ProductReceivedTime' \ -outputFormat '$CAS.ProductReceivedTime,$CAS.ProductName,$CAS.ProductId,$ProductType,\ $ProductStructure,$Filename,$FileLocation,$MimeType' |
The output should look like:2011-10-07T10:59:12.031+02:00,blah.txt,a00616c6-f0c2-11e0-baf4-65c684787732,
GenericFile,Flat,blah.txt,/var/kat/archive/data/blah.txt,text/plain
Now you can also check out some of the other 12 --operation
possibilities for filemgr-client. For instance:
$ ./filemgr-client --url http://localhost:9000
--operation --hasProduct --productName blah.txt
Or:
$ ./filemgr-client --url http://localhost:9000
--operation --getFirstPage --productTypeName GenericFile
...
Cameron Goodale has written some useful command line tools aliases that are worth mentioning before we continue. See the following two web pages: https://issues.apache.org/jira/browse/OODT-306
BASH and TCSH shell tools for File Manager
...