Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

Note

In progress

This is a proposal for new classes to aid in the development of MetExtractors for the Crawler (client-side met extraction).

...

Differences from FilenameRegexMetExtractor: 

  1. Assigns ProductType. FilenameRegexMetExtractor runs after ProductType is already determined.
  2. Runs on the client-side (crawler). FilenameRegexMetExtractor runs on the server-side (filemgr).
  3. Different patterns for different ProductTypes. FilenameRegexMetExtractor config applies the same pattern to all files.

...

It is a common use case to ingest the files output by a PGE task, and at the same time generate/extract metadata. PGE tasks use PcsMetFileWriter subclasses to generate a metadata file before ingesting the file+metadata. We should be able to reuse CmdLineMetExtractors (crawler met extractors) in PGE tasks. To accomplish this, we create a generic PcsMetFileWriter wrapper that invokes CmdLineMetExtractors with their accompanying config file.

No Formatnote
Is this obsolete? I was looking for "FilenameExtractorWriter" and "PcsMetFileWriter", and they are no longer in OODT. In fact, they last appeared in v0.3.   Does the 0.7 PGE task somehow invoke the crawler for ingestion?