Page History

...

./crawler_launcher
--filemgrUrl http://localhost:9000^{Image Removed}
--operation --launchMetCrawler
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
--metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config

...

I had a file manager listening on http://localhost:9000^{Image Removed}.
I've used an external meta data extractor (written in python) to extract data from HDF5 files.
MetExtractorProductCrawler example configuration can be found in the source (allows you to specify how the crawler will run your extractor): https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml^{Image Removed}

MetExtractorProductCrawler, using the TikaCmdLineMetExtractor (an easier approach)

...

Invocation command:
./crawler_launcher
--filemgrUrl http://localhost:9000^{Image Removed}
--operation --launchMetCrawler
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--metExtractor org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
--metExtractorConfig /usr/local/meerkat/extractors/tikaextractor/tikaextractor.config

...

./crawler_launcher --operation --AutoDetectProductCrawlerlaunchAutoCrawler

I followed a similar approach for getting the MetExtractorProductCrawler working. For completeness, here is my complete command line:

./crawler_launcher
--operation --AutoDetectProductCrawlerlaunchAutoCrawler
--filemgrUrl http://localhost:9000^{Image Removed}
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--mimeExtractorRepo ../policy/mime-extractor-map.xml

...

I had a file manager listening on http://localhost:9000^{Image Removed}.
I've used an external meta data extractor (written in python) to extract data from HDF5 files.
AutoDetectProductCrawler example configuration can be found in the source:
- Uses the same metadata extractor specification file (you will have one of these for each mime-type).
- Allows you to define your mime-types – that is, give a mime-type for a given filename regular expression.
- maps your mime-types to extractors.

Space shortcuts

Page tree

Versions Compared

Old Version 5

New Version Current

Key

MetExtractorProductCrawler, using the TikaCmdLineMetExtractor (an easier approach)