Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Warning
titleWarning

Work in progress. some parts of the original documentation is out of date and currently not working, this will be updated in due course.

Table of Contents

Introduction

...

Cool as a one line format translator is, we are actually going to have to do a little more work to create an extractor capable of producing metadata for CAS-Curator. A requirement for metadata extractors that are to be integrated with CAS-Curator is that they product three pieces of metadata:
ProductType
FileLocation
Filename
We should note that this is NOT a general requirement of all metadata extractors, but a ramification of the current implementation of CAS-Curator. In order to product this extra metadata, we will develop a small Python script:

Code Blocknoformat
#!/usr/bin/python

import os
import sys

fullPath = sys.argv[1]
pathElements = fullPath.split("/");
fileName = pathElements[len(pathElements)-1]
fileLocation = fullPath[:(len(fullPath)-len(fileName))]
productType = "MP3"

cmd = "java -jar /Users/woollard/Desktop/extractors/mp3extractor/"
cmd += "tika-app-1.4.jar -m "+fullPath+" | awk -F:"
cmd += " 'BEGIN {print \"<cas:metadata xmlns:cas="
cmd += "\\\"http://oodt.jpl.nasa.gov/1.0/cas\\\">\"}" 
cmd += " {print \"<keyval><key>\"$1\"</key><val>\"substr($2,1)\""
cmd += "</val></keyval>\"}' > "+fileName+".met"

os.system(cmd)

f = open(fileName+".met", 'a')
f.write('<keyval><key>ProductType</key><val>'+productType)
f.write('</val></keyval>\n<keyval><key>Filename</key><val>')
f.write(fileName+'</val></keyval>\n<keyval><key>FileLocation')
f.write('</key><val>'+fileLocation+'</val></keyval>\n')
f.write('</cas:metadata>')
f.close()

...

Code Block
cd /usr/local/extractors/mp3extractor
ls -l
total 51448
-rw-r--r--  1 -  -       167 Nov 27 13:50 config.properties
-rw-r--r--  1 -  -       328 Nov 27 13:49 mp3PythonExtractor.config
-rwxr-xr-x  1 -  -       702 Nov 27 13:49 mp3PythonExtractor.py
-rw-r--r--  1 -  -  26325155 Nov 27 13:46 tika-app-01.5-SNAPSHOT4.jar

Once you restart Tomcat, the change you have made to the context file will be used. The extractor area will now be set to /usr/local/extractors.

...

The final step in our basic configuration of CAS-Curator is to configure a CAS-Filemgr instance into which we will ingest our mp3s. There is a lot of information on configuring the CAS-Filemgr in its User's Guide. We will assume familiarity with the CAS-Filemgr for the remainder of this guide.
In this guide, we will focus on the basic configuration necessary to tailor a vanilla build of the CAS-Filemgr for use with our CAS-Curator. We will assume that you have built the latest release of the CAS-Filemgr (v1.8.0 at the time of this writing) and installed it at:
/usr/local/srcoodt/cas-filemgr-1.8.0/

The first step in configuring the CAS-Filemgr is to edit the filemgr.properties file in the etc directory. This file controls the basic configuration of the CAS-Filemgr, including its various extension points. For this example, we are going to run the CAS-Filemgr in a very basic configuration, with both its repository and validation layer controlled by XML configuration, a local data transfer factory, and a Lucene-based metadata catalog.
In order to create this configuration, we will change the following parameters in the filemgr.properties file:
Set org.apache.oodt.cas.filemgr.catalog.lucene.idxPath to /usr/local/srcoodt/cas-filemgr-1.8.0/catalog. This parameter tells CAS-Filemgr where to create the Lucene index. The first time you start the CAS-Filemgr, make sure that this file does NOT exist. The CAS-Filemgr will take care of creating it and populating it with the appropriate files.
Set org.apache.oodt.cas.filemgr.repositorymgr.dirs to file:///usr/local/srcoodt/cas-filemgr-1.8.0/policy/mp3. The value needs to be a URL and we are pointing to a policy folder we will create.
Set org.apache.oodt.cas.filemgr.validation.dirs to file:///usr/local/srcoodt/cas-filemgr-1.8.0/policy/mp3. Like the last parameter we configured, this parameter should be a URL and point to the same policy folder.
With these changes, you are ready to run the basic configuration of the CAS-Filemgr. In order to make this install of CAS-Filemgr work with our CAS-Curator, however, we will also need to augment the basic policy for both the repository manager and validation layer.
First, we will create a policy directory for our mp3 curator. We can do this by moving the current policy files from the base policy directory to a mp3 directory:

Code Block
cd /usr/local/srcoodt/cas-filemgr-1.8.0/policy
mkdir mp3
mv *.xml mp3/

...

We will now start the CAS-Filemgr instance. This instance will run on port 9000 by default. In order to start the Filemgr, we will issue the following commands:

Code Block
cd /usr/local/srcoodt/cas-filemgr-1.8.0/bin
./filemgr start

Now that we have started the CAS-Filemgr, we will need to configure the CAS-Curator to use this Filemgr instance. In order to do this, we will add the following parameters to the CAS-Curator context file:

Code Block
langxml
<Parameter name="org.apache.oodt.cas.fm.url"
        value="http://localhost:9000"/>

<Parameter name="org.apache.oodt.cas.curator.dataDefinition.uploadPath"
        value="/usr/local/srcoodt/cas-filemgr-1.8.0/policy" />

<Parameter name="org.apache.oodt.cas.curator.fmProps"
        value="/usr/local/srcoodt/cas-filemgr-1.8.0/etc/filemgr.properties"/>

...

Code Block
> ls -lR /usr/local/archive
total 0
drwxr-xr-x  3 -  - 102 Nov 27 23:53 Bach-SuiteNo2.mp3

/usr/local/archive//Bach-SuiteNo2.mp3:
total 9344
-rw-r--r--  1 -  -  4781079 Nov 25 20:14 Bach-SuiteNo2.mp3

Wiki MarkupWorth noting is the fact that our configuration of the CAS-Filemgr included a selection of the {{BasicVersioner}} as the MP3 product type versioner. This means that mp3s are placed at \[archive_base\]/\[filename\]/\[filename\ ] during ingest.
We have now completed a base configuration of the CAS-Curator. In the [Advanced Guide|http://oodt.apache.org/components/maven/curator/user/advanced.html], we will cover topics like changing the look and feel of the Curator, and security configuration.