Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The table below provides a canonical, comprehensive collection of Nutch metrics.

Info
titleTable Ordering Logic

The table is arranged

  1. First by Tool column; the sequenced order of tool invocation within a typical Nutch crawl cycle i.e., Injector, Generator, Fetch, Parse, etc.
  2. Secondly by the Metric Name column; with counter name's ordered alphabetically for the given Metric Group they belong to.


ToolMetric GroupMetric NameDescription




Injector
injectorurls_filtered
injectorurls_injected
injectorurls_merged
injectorurls_purged_404
injectorurls_purged_filter







Generator

GeneratorSCHEDULE_REJECTED
GeneratorWAIT_FOR_UPDATE
GeneratorEXPR_REJECTED
GeneratorSTATUS_REJECTED
GeneratorSCORE_TOO_LOW
GeneratorINTERVAL_REJECTED
GeneratorMALFORMED_URL
GeneratorHOSTS_AFFECTED_PER_HOST_OVERFLOW
GeneratorURLS_SKIPPED_PER_HOST_OVERFLOW




















./src/test/org/apache/nutch/crawl/CrawlDbUpdateUtil.java

./src/test/org/apache/nutch/crawl/CrawlDBTestUtil.java

./src/java/org/apache/nutch/tools/warc/WARCExporter.java

./src/java/org/apache/nutch/util/SitemapProcessor.java

./src/java/org/apache/nutch/util/domain/DomainStatistics.java

./src/java/org/apache/nutch/parse/ParseSegment.java

./src/java/org/apache/nutch/fetcher/Fetcher.java

./src/java/org/apache/nutch/fetcher/FetcherThread.java

./src/java/org/apache/nutch/fetcher/QueueFeeder.java

./src/java/org/apache/nutch/crawl/CrawlDb.java

./src/java/org/apache/nutch/crawl/CrawlDbReducer.java

./src/java/org/apache/nutch/crawl/DeduplicationJob.java

./src/java/org/apache/nutch/crawl/CrawlDbFilter.java

./src/java/org/apache/nutch/hostdb/UpdateHostDbMapper.java

./src/java/org/apache/nutch/hostdb/UpdateHostDbReducer.java

./src/java/org/apache/nutch/hostdb/ResolverThread.java

./src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

./src/java/org/apache/nutch/indexer/IndexingJob.java

./src/java/org/apache/nutch/indexer/IndexerMapReduce.java

./src/java/org/apache/nutch/indexer/CleaningJob.java

Conclusion