TODO: Examples need to also include changes that are required in the filemanager.properties file to show how file manager must be configured to pull in the policy files.

The CAS File Manager is a great data archive tool that is extremely flexible, but sometimes that flexibility can lead to confusion since you can perform a single task (like defining metadata elements) in various ways. This page's focus is to capture the 'Best Practices' people have found on their projects/experience when creating policy.

To try and keep the confusion to a minimum we will start with a Taxonomy to define some key terms, then jump into some Operational Scenarios. The scenario for all the Ops Scenario's will be built around the idea of cataloging and archiving MEDIA (audio, video, images) since most everyone can relate to these items. Some of the simple examples will focus on a single media format, while the more complex examples will show how to deal with an ever changing media library.

Taxonomy

File Manager Policy - All the *.xml files that the filemanager will use to define metadata

Data Set -

Product Type - Logical grouping of products with a MetExtractor and Versioner which is defined in product-types.xml and product-type-element-map.xml.

Virtual Product Type - Use to group metadata elements together, but has no MetExtractor or Versioner. This is only defined within the product-type-element-map.xml file.

Metadata Elements - Data elements that will be cataloged by the File Manager about a product. Metadata Elements must be listed in elements.xml and product-type-element-map.xml.

Operational Scenarios

Simple File Manager Policy

  • One set of File Manager Policy
  • All of the Data Sets are homogenous

Example: You want to catalog and archive music files. They are logically grouped together by some album, but every song you archive has the same metadata elements. In this case we have a single Data Set called Music, which will be mapped to the default Product Type: GenericFile.

Sample Policy Overview

NOTE: Items that are bold/italic are default policy that come pre-installed with the File Manager and do not need to be edited.

product-types.xml

  • GenericFile

elements.xml

  • CAS.ProductId
  • CAS.ProductName
  • CAS.ProductReceivedTime
  • Filename
  • FileLocation
  • ProductType
  • ProductStructure
  • MimeType
  • Album
  • Artist
  • Track_Number
  • Year
  • Title

product-type-element-map.xml

  • type=GenericFile
  • +CAS.ProductId
  • +CAS.ProductName
  • +CAS.ProductReceivedTime
  • +Filename
  • +FileLocation
  • +ProductType
  • +ProductStructure
  • +MimeType
  • +Album
  • +Artist
  • +Track_Number
  • +Year
  • +Title

File Manager Policy with Inheritance

  • One set of File Manager Policy
  • There are some standard elements common to ALL Data sets
  • Data Sets are heterogeneous

Example: You want to catalog and archive music AND video files. Now both of these files can be grouped under the more generic title of MEDIA, and they do share some metadata elements like 'Title' and 'Year', but they start to diverge with format specific terms (i.e. 'sample rate' vs. 'resolution').

So we DO NOT want to repeat the elements that both data sets have in common, so we introduce the idea of PARENT product types within the product-type-element-map.xml file. Whatever elements a PARENT product type contains are inherited by the children. In the Sample Policy below you see that Title and Year are declared ONCE but 2 child product types (Music and Video) can use those elements during ingestion.

Sample Policy Overview

NOTE: Items that are default policy and listed in the first example have been replaced with CAS.DEFAULTS to save space.

product-types.xml

  • GenericFile
  • Music
  • Video

elements.xml

  • CAS.DEFAULTS
  • Album
  • Artist
  • Track_Number
  • Year
  • Title
  • Sample_Rate
  • Resolution
  • Chapters
  • Aspect_Ratio
  • Director

product-type-element-map.xml

  • type=GenericFile
  • +CAS.DEFAULTS
  • +Year
  • +Title
  • type=Music parent=GenericFile
  • +Album
  • +Artist
  • +Track_Number
  • +Sample_Rate
  • type=Video parent=GenericFile
  • +Resolution
  • +Chapters
  • +Aspect_Ratio
  • +Director

Building File Manager Policy with Advanced Inheritance and Organization to support Growth

  • Core Elements will be Setup and Used
  • Great Divergence of metadata between Data Sets
  • Need to easily add new unknown Data Sets in the Future
  • Separation of Concerns (more than one person can update policy)
  • The filemanager.properties file within the ./etc/ directory will need to include paths to all policy files, or the file manager will not pick them up.
  • Virtual Product Types will be used
VIRTUAL vs. STANDARD Product Types

Before diving into this example we should take a moment to explain the difference between a VIRTUAL and STANDARD Product Type.
Virtual Product Types are only used for grouping and organizing metadata elements to support inheritance. These Product Types ARE NOT declared within a product-types.xml and they DO NOT have a MetExtractor associated with them.
Standard Product Types on the other hand MUST BE declared in product-types.xml and MetExtractors can be associated.

Example: Now the Media catalog is being updated by 5 different vendors. At the minimum all vendors will include a core set of metadata about ANY file they want to add into the catalog (this will be the CAS.DEFAULTS), but in addition to the defaults there are also defaults for the 3 supported file types, they are VIDEO.DEFAULTS, AUDIO.DEFAULTS, and IMAGE.DEFAULTS.

The hardest part about getting data from 5 different vendors is making them agree on what else they will add to the catalog, and for that we will enable Vendor Specific Policy. Along with having 5 vendors in the mix we need to make sure they cannot edit the CORE elements, and that they cannot edit EACH OTHERS elements, so we need to also talk about WHERE on disk the *.xml files will be stored and organized.

Sample Policy Overview

NOTE: Again to keep the example brief DEFAULTS will be used to represent large blocks of content that can be seen in previous examples. We will also include a FILEPATH above each filename to demonstrate WHERE on disk the files should be stored. You can store them where ever you like, this is just a suggestion of how to organize the policy files.

CORE POLICY

/usr/local/oodt/filemgr/policy/core
product-types.xml

  • GenericFile

elements.xml

  • CAS.DEFAULTS
  • VIDEO.DEFAULTS
  • AUDIO.DEFAULTS
  • IMAGE.DEFAULTS

product-type-element-map.xml

  • type=GenericFile
  • +CAS.DEFAULTS
  • type=Audio parent=GenericFile <<<Audio is a VIRTUAL Product Type
  • AUDIO.DEFAULTS
  • type=Video parent=GenericFile <<<Video is a VIRTUAL Product Type
  • VIDEO.DEFAULTS
  • type=Images parent=GenericFile <<<Images is a VIRTUAL Product Type
  • IMAGE.DEFAULTS
VENDORX POLICY

/usr/local/oodt/filemgr/policy/vendorx
product-types.xml

  • AudioBook
  • Records

elements.xml

  • Author
  • Publisher
  • RPM
  • Diameter

product-type-element-map.xml

  • type=AudioBook parent=Audio <<AudioBook is a STANDARD Product Type
  • +Author
  • +Publisher
  • type=Records parent=Audio <<Records is a STANDARD Product Type
  • +RPM
  • +Diameter
  • No labels