Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here's how to get some useful feedback about crawler configurations:

./crawler_launcher --printSupportedCrawlersprintSupportedActions
./crawler_launcher --printSupportedActionsprintSupportedCrawlerActions
./crawler_launcher --printSupportedPreconditions

There where two cralwerIds crawlers that I was particularly interested in using - the MetExtractorProductCrawler and the AutoDetectProductCrawler (the StdProductCrawler does not support meta data extraction).

So, now you want to know more about how to get these crawlers up and running? Ask the crawler!

./crawler_launcher --operation --launchStdCrawler
./crawler_launcher -h -operation --crawlerId MetExtractorProductCrawlerlaunchMetCrawler
./crawler_launcher --h operation --crawlerId AutoDetectProductCrawlerlaunchAutoCrawler

As you can see there are quiet a few the command line options that need to specified are listed after running the command. My approach was to iteratively add the command line options. The simplest command that you can get some useful feedback from, is to specify the crawlerIDcrawler.

MetExtractorProductCrawler

To get the meta data extractor product crawler working I ran:

./crawler_launcher --operation -crawlerId MetExtractorProductCrawler-launchMetCrawler

The crawler then failed, since there was a command line option that needed to be specified. So I added that option and ran the command again to see where it failed next.

This the complete met extractor command that I eventually ran:

./crawler_launcher
--crawlerId MetExtractorProductCrawler
--filemgrUrl http://localhost:9000
--operation --launchMetCrawler
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
--metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config

...

To get the auto detect product crawler working I ran:

./crawler_launcher --crawlerId operation --AutoDetectProductCrawler

I followed a similar approach for getting the MetExtractorProductCrawler working. For completeness, here is my complete command line:

./crawler_launcher
--operation --crawlerId AutoDetectProductCrawler
--filemgrUrl http://localhost:9000
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--mimeExtractorRepo ../policy/mime-extractor-map.xml

...