Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The last step in configuring our mp3 metadata extractor is to provide a properties file for CAS-Curator so that it knows how to call the ExternMetExtractor. Each extractor used by CAS-Curator needs a config.properties file. This file sets two properties:*

  • extractor.classname

...

  • extractor.config.files

Create a config.properties file (this name is important for CAS-Curator to pick up the cofiguration) in the /usr/local/extractors/mp3extractor directory. This file should consist of the following parameters:

...

In the above screenshot, we see that, upon clicking on the mp3 file, metadata produced by the mp3extractor is shown in the main right staging pane. Now staging and extraction are set up. In the next section, we will set up a CAS-Filemgr instance and show how CAS-Curator can be used to ingest products.

File Manager Configuration

The final step in our basic configuration of CAS-Curator is to configure a CAS-Filemgr instance into which we will ingest our mp3s. There is a lot of information on configuring the CAS-Filemgr in its User's Guide. We will assume familiarity with the CAS-Filemgr for the remainder of this guide.
In this guide, we will focus on the basic configuration necessary to tailor a vanilla build of the CAS-Filemgr for use with our CAS-Curator. We will assume that you have built the latest release of the CAS-Filemgr (v1.8.0 at the time of this writing) and installed it at:
/usr/local/src/cas-filemgr-1.8.0/

The first step in configuring the CAS-Filemgr is to edit the filemgr.properties file in the etc directory. This file controls the basic configuration of the CAS-Filemgr, including its various extension points. For this example, we are going to run the CAS-Filemgr in a very basic configuration, with both its repository and validation layer controlled by XML configuration, a local data transfer factory, and a Lucene-based metadata catalog.
In order to create this configuration, we will change the following parameters in the filemgr.properties file:
Set org.apache.oodt.cas.filemgr.catalog.lucene.idxPath to /usr/local/src/cas-filemgr-1.8.0/catalog. This parameter tells CAS-Filemgr where to create the Lucene index. The first time you start the CAS-Filemgr, make sure that this file does NOT exist. The CAS-Filemgr will take care of creating it and populating it with the appropriate files.
Set org.apache.oodt.cas.filemgr.repositorymgr.dirs to file:///usr/local/src/cas-filemgr-1.8.0/policy/mp3Image Added. The value needs to be a URL and we are pointing to a policy folder we will create.
Set org.apache.oodt.cas.filemgr.validation.dirs to file:///usr/local/src/cas-filemgr-1.8.0/policy/mp3Image Added. Like the last parameter we configured, this parameter should be a URL and point to the same policy folder.
With these changes, you are ready to run the basic configuration of the CAS-Filemgr. In order to make this install of CAS-Filemgr work with our CAS-Curator, however, we will also need to augment the basic policy for both the repository manager and validation layer.
First, we will create a policy directory for our mp3 curator. We can do this by moving the current policy files from the base policy directory to a mp3 directory:

Code Block

cd /usr/local/src/cas-filemgr-1.8.0/policy
mkdir mp3
mv *.xml mp3/    

Next, we will add a product type to our instance of the CAS-Filemgr. In order to do this, we will edit the product-types.xml file in the policy/mp3 directory. We will add the following as a child of the <cas:producttypes> node (we purposefully elide any commentary on the details of this configuration and leave it to the reader):

Code Block
langxml
 
<type id="urn:example:MP3" name="MP3">
  <repository path="file:///usr/local/archive"/>
  <versioner class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>
  <description>A product type for mp3 audio files.</description>
  <metExtractors>
    <extractor
   class="org.apache.oodt.cas.filemgr.metadata.extractors.CoreMetExtractor">
      <configuration>
        <property name="nsAware" value="true" />
        <property name="elementNs" value="CAS" />
        <property name="elements"
              value="ProductReceivedTime,ProductName,ProductId" />
      </configuration>
    </extractor>
  </metExtractors>
</type>

Next, we will create a number of elements in the elements.xml file. There will be an element node for each of the metadata elements we want to associate with MP3 products. We can do this be adding the following as children nodes of <cas:elements> tag:

Code Block
langxml
      
<element id="urn:example:FileLocation" name="FileLocation">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:ProductType" name="ProductType">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:Author" name="Author">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:Filename" name="Filename">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:resourceName" name="resourceName">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:title" name="title">
  <dcElement/>
  <description/>
</element>
<element id="urn:example:Content-Type" name="tContent-Type">
  <dcElement/>
  <description/>
</element> 

After we have configured the new metadata elements, we will need to map these elements to our MP3 product. We do this by editing the product-type-element-map.xml file in the policy/mp3 directory to add the following as a child node to <cas:producttypemap>:

Code Block
langxml
        
<type id="urn:example:MP3">
  <element id="urn:example:FileLocation"/>
  <element id="urn:example:ProductType"/>
  <element id="urn:example:Author"/>
  <element id="urn:example:Filename"/>
  <element id="urn:example:resourceName"/>
  <element id="urn:example:title"/>
  <element id="urn:example:Content-Type"/> 
</type>

A final configuration step will be to create the archive area for the CAS-Filemgr (You'll remember that we set the repository path for MP3 products in the product-types.xml file). In order to do this, we will just make the directory:

Code Block

mkdir /usr/local/archive

We will now start the CAS-Filemgr instance. This instance will run on port 9000 by default. In order to start the Filemgr, we will issue the following commands:

Code Block

cd /usr/local/src/cas-filemgr-1.8.0/bin
./filemgr start

Now that we have started the CAS-Filemgr, we will need to configure the CAS-Curator to use this Filemgr instance. In order to do this, we will add the following parameters to the CAS-Curator context file:

Code Block
langxml
    
<Parameter name="org.apache.oodt.cas.fm.url"
        value="http://localhost:9000"/>
            
<Parameter name="org.apache.oodt.cas.curator.dataDefinition.uploadPath"
        value="/usr/local/src/cas-filemgr-1.8.0/policy" />

<Parameter name="org.apache.oodt.cas.curator.fmProps"
        value="/usr/local/src/cas-filemgr-1.8.0/etc/filemgr.properties"/>        

Once we restart Tomcat, the CAS-Curator will now recognize the policy and properties of the configured CAS-Filemgr instance and use this instance during the ingest process.

From the above image, you can see that the CAS-Filemgr configuration has been picked up by CAS-Curator. If you double-click on MP3 in the left filemgr main pane, you will see the product types that are contained in the mp3 policy: GenericFile which was part of the default configuration, and MP3 which we added. Clicking on MP3, we bring up the ingest interface in the right filemgr main pane.

Once we drag the Bach-SuiteNo2.mp3 from the staging pane to the green box in the right filemgr main pane, we can then select a metadata extractor from the pulldown menu and click on the "Save as Ingestion Task." This will add the Ingest task to the bottom pane as illustrated in the above screenshot. In order to test file ingestion, we will click on the "Start" button.
As a final step, we will confirm that the mp3 file was archived. We can do this by listing the archive:

Code Block

> ls -lR /usr/local/archive
total 0
drwxr-xr-x  3 -  - 102 Nov 27 23:53 Bach-SuiteNo2.mp3

/usr/local/archive//Bach-SuiteNo2.mp3:
total 9344
-rw-r--r--  1 -  -  4781079 Nov 25 20:14 Bach-SuiteNo2.mp3

Wiki Markup
Worth noting is the fact that our configuration of the CAS-Filemgr included a selection of the {{BasicVersioner}} as the MP3 product type versioner. This means that mp3s are placed at \[archive_base\]/\[filename\]/\[filename\] during ingest.
We have now completed a base configuration of the CAS-Curator. In the [Advanced Guide|http://oodt.apache.org/components/maven/curator/user/advanced.html], we will cover topics like changing the look and feel of the Curator, and security configuration.