Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tool/ObjectMetric GroupMetric NameDescription
CleaningJobCleaningJobStatusDeleted documents
CrawlDbFilterCrawlDB filterGone records removed
CrawlDB filterOrphan records removed
CrawlDB filterURLs filtered
CrawlDbReducerCrawlDB status

CrawlDatum.getStatusName(CrawlDatum().getStatus())


DeduplicationJobDeduplicationJobStatusDocuments marked as duplicate
DomainStatistics


MyCounter.EMPTY_RESULT

MyCounter.FETCHED

MyCounter.NOT_FETCHED

Fetcher
FetcherStatusbytes_downloaded
FetcherStatushitByThrougputThreshold
FetcherStatushitByTimeLimit








FetcherThread
FetcherStatusAboveExceptionThresholdInQueue
FetcherStatusFetchItem.notCreated.redirect
FetcherStatusoutlinks_detected
FetcherStatusoutlinks_following
FetcherStatusProtocolStatus.getName()
FetcherStatusredirect_count_exceeded
FetcherStatusredirect_deduplicated
FetcherStatusrobots_denied
FetcherStatusrobots_denied_maxcrawldelay
ParserStatusParseStatus.majorCodes[p.getData().getStatus().getMajorCode()]







Generator















GeneratorEXPR_REJECTED
GeneratorHOSTS_AFFECTED_PER_HOST_OVERFLOW
GeneratorINTERVAL_REJECTED
GeneratorMALFORMED_URL
GeneratorSCHEDULE_REJECTED
GeneratorSCORE_TOO_LOW
GeneratorSTATUS_REJECTED
GeneratorURLS_SKIPPED_PER_HOST_OVERFLOW
IndexerMapReduce








IndexerStatusdeleted (duplicates)

deleted (IndexingFilter)

deleted (gone)

deleted (redirects)

deleted (robots=noindex)

errors (IndexingFilter)

errors (ScoringFilter)

indexed (add/update)

skipped (IndexingFilter)

skipped (not modified)




Injector



injectorurls_filtered
injectorurls_injected
injectorurls_merged
injectorurls_purged_404
injectorurls_purged_filter
ParseSegmentParserStatusParseStatus.majorCodes[parseStatus.getMajorCode()]
QueueFeederFetcherStatusfiltered
(also QueueFeeder)FetcherStatusAboveExceptionThresholdInQueue
ResolverThread






UpdateHostDbchecked_hosts
UpdateHostDbexisting_known_host
UpdateHostDbexisting_unknown_host
UpdateHostDbnew_known_host
UpdateHostDbnew_unknown_host
UpdateHostDbpurged_unknown_host
UpdateHostDbrediscovered_host
UpdateHostDbLong.toString(datum.numFailures()) + "_times_failed"
SitemapProcessor





Sitemapexisting_sitemap_entries
Sitemapfailed_fetches
Sitemapfiltered_records
Sitemapfiltered_sitemaps_from_hostname
Sitemapnew_sitemap_entries
Sitemapsitemaps_from_hostname
Sitemapsitemap_seeds
UpdateHostDbMapperUpdateHostDbfiltered_records
UpdateHostDbReducerUpdateHostDbtotal_hosts
(also UpdateHostDbReducer)UpdateHostDbskipped_not_eligible
WebGraphWebGraph.outlinksadded links
(also WebGraph)WebGraph.outlinksremoved links
WARCExporter




WARCExporterexception
WARCExporterinvalid URI
WARCExportermissing content
WARCExportermissing metadata
WARCExporteromitted empty response
WARCExporterrecords generated

./src/java/org/apache/nutch/indexer/CleaningJob.java


Conclusion