The File Manager
A This self guided tutorial to the OODT file manager.
Installation
Assumption - you have built/access to the OODT install targets, , which means you've correctly configured maven and java for your system.
Here is the command to install the filemgr target from a tarfile. You will need to fit in the "..." with the appropriate content.
$ tar xzvf .../OODT/.../filemgr/target/cas-filemgr-...-dist.tar.gz -C /usr/local/oodt/
$ cd /usr/local/oodt/
$ ln -s cas-filemgr-.../ cas-filemgr
Overview of installation
is intended for first time users.
The fact that you've found this page, I assume that you are seriously thinking of using the OODT File Manager but are eager to get something up and running. It hopefully also means that you've checked out the code and built a cas-filemgr install target (e.g. a cas-filemgr-0.4-SNAPSHOT-dist.tar.gz
file).
This tutorial is by no means a complete overview of all the File Managers functionality. However, it's an attempt to get you started using the basic tools. Like learning to drive a car, the most difficult part is getting it started and on the road!
The following topics are covered on this page:
- An Overview of What is Installed
- Configuring and Running the File Manager Server
- A Typical User Scenario - ingesting and querying
An Overview of What is Installed
Assumption - you have built or have access to a cas-filemgr install target. This also means that you've correctly configured maven and java for your system.
Here are the commands to install the cas-filemgr target from a tarfile. You will need to fit in the "..." with the appropriate content.
No Format | ||
---|---|---|
| ||
$ mkdir -p /usr/local/oodt/
$ tar xzvf .../filemgr/target/cas-filemgr-0.4-SNAPSHOT-dist.tar.gz -C /usr/local/oodt/
$ cd /usr/local/oodt/
$ ln -s cas-filemgr-0.4-SNAPSHOT/ cas-filemgr |
The decompressed tar file creates a directory structure that looks as follows:
No Format | ||
---|---|---|
| ||
.
├── bin
│ ├── convert_map
│ ├── filemgr
│ ├── filemgr-client
│ ├── migrate_xml_policy
│ └── query_tool
├── etc
│ ├── filemgr.properties
│ ├── logging.properties
│ └── mime-types.xml
├── lib
│ └── *.jar
├── logs
│ ├── REMOVE.log
└── policy
├── elements.xml
├── product-type-element-map.xml
└── product-types.xml |
$ cd /usr/local/oodt/cas-filemgr
$ ls
bin catalog etc lib logs policy
Here is a brief description of each directory that you see listed:
bin
: contains shell convenience scripts for launching java classesetc
: contains configuration files, i.e. *.property and *.xml fileslib
: contains java resources, i.e *.jar fileslogs
: contains server log files.policy
: contains product specifications, i.e *.xml specification files
$ cd /usr/local/oodt/cas-filemgr/bin
$ ls
convert_map filemgr filemgr-client migrate_xml_policy query_tool
- .e *.xml specification files
The bin
directory contains a number of executables:
filemgr
: server management (startup/shutdown) filemgr : server convenience scriptfilemgr-client
: client convenience interface scriptquery_tool
: catalog query tool convenience scriptconvert_map
: ???migrate_xml_policy
: ???
...
Configuring and Running the File Manager Server
You're now ready to run the filemgr serviceserver!
No Format | ||
---|---|---|
| ||
$ cd /usr/local/oodt/cas-filemgr/bin |
$ ./filemgr --help |
Usage: ./filemgr {start|stop|restart|status} |
$ ./filemgr start |
Whats going to happen?
The filemgr should be up and running, however, some WARNING messages will appear, complaining about configuration.
...
Restart your filemgr so that it re-reads the filemgr.properties and product-types.xml:
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr restart
What have we configured?
- A place to store your catalog, i.e. the database of metadata.
- A place to store your ingested files, i.e. the repository.
- The location of your policy directory for product specifications.
- Your mime-types configuration file for file recognition.
How metadata is collected?
Now for some brief notes about how metadata is collected. The filemgr captures metadata in two different ways - from client side metadata extraction and server side metadata extraction.
Client side metadata is passed to the filemgr via an xml formatted metadata file. E.g. a file called blah.txt
can have a metadata file called blah.txt.met
. This met file can be created in many ways, even by hand! And thats exactly what we're going to do.
Server side metadata is generated by using java classes and the extractors that will be used are configured in the product-types.xml file in the policy directory. You should have /usr/local/oodt/cas-filemgr/policy/
as policy the directory.
Now would be a good time to have a quick look at the product-types.xml
file. It contains some critical information about what is going to happen when we ingest our first file into the repository.
...
For the GenericFile type find the <metadata>
key. It's specifying some metadata . We're defining the product type!
For the GenericFile type find the <metExtractors>
key. It's specifying some extractors to use for server side metadata extraction, namely: CoreMetExtractor, MimeTypeExtractor, FinalFileLocationExtractor. For more details about metadata and extractors see the following page: http://oodt.apache.org/components/maven/metadata/user/basic.html
If you're feeling curious, check out the other xml files in the /usr/local/oodt/cas-filemgr/policy/
directory to get a better feel for how we define product types and elements. For a discussion of best practices w.r.t File Manager Policy, the reader is referred to Everything you want to know about File Manger Policy
...
These commands are found in /usr/local/oodt/cas-filemgr/bin
.
Command: filemgr-client
In order to trigger a file ingestion we're going to use the filemgr-client
. This is by no means the most automated way to ingest data into an repository, however it's a really easy and intuitive way to trigger a file ingestion. The filemgr-client
is a wrapper script, making it easier to invoke a java executable from the command line.
No Format | ||
---|---|---|
| ||
Wiki Markup | ||
{{$ cd /usr/local/oodt/cas-filemgr/bin}} {{$ ./filemgr-client --help}} {{filemgr-client --url <url to xml rpc service> --operation \[<operation> \[params\]\]}} {{operations:}} {{--addProductType --typeName <name> --typeDesc <description> --repository <path> --versionClass <classname of versioning impl>}} {{\-\-ingestProduct \-\-productName <name> \-\-productStructure <Hierarchical|Flat> \ -\-productTypeName <name of product type> \-\-metadataFile <file> \ [--clientTransfer --dataTransfer <java class name of data transfer factory>\] --refs <ref1>...<refn>}} {{--hasProduct --productName <name>}} {{--getProductTypeByName --productTypeName <name>}} {{--getNumProducts --productTypeName <name>}} {{--getFirstPage --productTypeName <name>}} {{--getNextPage --productTypeName <name> --currentPageNum <number>}} {{--getPrevPage --productTypeName <name> --currentPageNum <number>}} {{--getLastPage --productTypeName <name>}} {{--getCurrentTransfer}} {{--getCurrentTransfers}} {{--getProductPctTransferred --productId <id> --productTypeName <name>}} {{--getFilePctTransferred --origRef <uri>}} |
As you can see there's a number of different ways this command can be executed .
The first command line argument is --url
. This is the location of the filemgr xml-rpc data transfer interface. Looking at the filemgr logs (specifically cas_filemgr0.log), we see an INFO statement telling us that local data transfer is enable on http://localhost:9000. This is the url that we need to specify.
The second command line argument is --operation
and there are 13 different types of operations that are possible! For now we are going to use the --ingestProduct
operation. From the help command you can see that the --ingestProduct
operation requires some further command line arguments to be specified.
However, before we take a look at the --operation --ingestProduct
, I would first like to shed a bit more light on the query_tool
command.
Command: query_tool
This is a very useful wrapper script to query the content of your repository.
No Format | ||
---|---|---|
| ||
Wiki Markup | ||
{{$ cd /usr/local/oodt/cas-filemgr/bin}} {{$ ./query_tool --help}} {{Must specify a query and filemgr url!}} {{Usage: QueryTool \[options\]}} {{options:}} {{\-\-url <fm url>}} {{ Lucene like query options:}} {{ --lucene}} {{\ -query <query>}} {{ SQL like query options:}} {{\-\-sql}} {{\ --sql -query <query>}} {{\ -sortBy <metadata-key>}} {{\ -outputFormat <output-format-string>}} |
We see that we need to set some command line arguments to get anything useful out of the query tool. Try the next command:
...
$ ls /usr/local/oodt/cas-filemgr/catalog
ls: /usr/local/oodt/cas-filemgr/catalog: No such file or directory
...
A Typical User Scenario
Time to ingest a very, very simple file. If you have not already, restart your filemgr so that it re-reads the filemgr.properties:
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./filemgr restart
For this simple ingestion we are not going to include any client side metadata, all the metadata collection will happen on the server side using the specified *Extractor extractors in the product-types.xml
file.
Create a text file and its metadata file for ingestion:
$ echo 'hello' > /tmp/blah.txt
$ touch /tmp/blah.txt.met
...
Lets ingest the file! For --operation --ingestProduct
we need to specify the following arguments:
--productName
: Name of ingested product--productStructure
: Flat file or directory (i.e. heretical). Yes... we can ingest whole directories as one product--productTypeName
: A product type (as per product-types.xml)--metadataFile
: The client side metadata file--refs
: The product location
Wiki Markup |
---|
There's also an optional argument --clientTransfer, however, we're going to leave this and use the default local transfer. {{\[\-\-clientTransfer --dataTransfer <java class name of data transfer factory>\]}} |
...
You've just archived your first file
.
To complete the process, lets see if we can retrieve the metadata. Run the query command again:
$ cd /usr/local/oodt/cas-filemgr/bin
$ ./query_tool --url http://localhost:9000
--sql -query 'SELECT * FROM GenericFile'
...
Note: Query commands do not depend on the underlying catalog implementation. The --sql
and --lucene
instead describe the filemgr query syntax.
Now you can also check out some of the other 12 --operation
possibilities for filemgr-client. For instance:
...
Cameron Goodale as written some useful command line tools aliases that are worth mentioning before we continue. See the following two web pages: https://issues.apache.org/jira/browse/OODT-306
BASH and TCSH shell tools for File ManagerMORE TO BE ADDED