Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleTable Ordering Logic

The table is arranged

  1. by Tool column; alphabetically
  2. by the Metric Group; alphabetically for the given tool
  3. by Metric Name; alphabetically for the given metric group


Tool/ObjectMetric GroupMetric NameDescription
CleaningJobCleaningJobStatusDeleted documents
CrawlDbFilterCrawlDB filterGone records removed
CrawlDB filterOrphan records removed
CrawlDB filterURLs filtered
CrawlDbReducerCrawlDB status

CrawlDatum.getStatusName(CrawlDatum().getStatus())


DeduplicationJobDeduplicationJobStatusDocuments marked as duplicate
DomainStatistics


MyCounter.EMPTY_RESULT

MyCounter.FETCHED

MyCounter.NOT_FETCHED

Fetcher
FetcherStatusbytes_downloaded
FetcherStatushitByThrougputThreshold
FetcherStatushitByTimeLimit








FetcherThread
FetcherStatusAboveExceptionThresholdInQueue
FetcherStatusFetchItem.notCreated.redirect
FetcherStatusoutlinks_detected
FetcherStatusoutlinks_following
FetcherStatusProtocolStatus.getName()
FetcherStatusredirect_count_exceeded
FetcherStatusredirect_deduplicated
FetcherStatusrobots_denied
FetcherStatusrobots_denied_maxcrawldelay
ParserStatusParseStatus.majorCodes[p.getData().getStatus().getMajorCode()]







Generator















GeneratorEXPR_REJECTED
GeneratorHOSTS_AFFECTED_PER_HOST_OVERFLOW
GeneratorINTERVAL_REJECTED
GeneratorMALFORMED_URL
GeneratorSCHEDULE_REJECTED
GeneratorSCORE_TOO_LOW
GeneratorSTATUS_REJECTED
GeneratorURLS_SKIPPED_PER_HOST_OVERFLOW
IndexerMapReduce








IndexerStatusdeleted (duplicates)

deleted (IndexingFilter)

deleted (gone)

deleted (redirects)

deleted (robots=noindex)

errors (IndexingFilter)

errors (ScoringFilter)

indexed (add/update)

skipped (IndexingFilter)

skipped (not modified)




Injector



injectorurls_filtered
injectorurls_injected
injectorurls_merged
injectorurls_purged_404
injectorurls_purged_filter
ParseSegmentParserStatusParseStatus.majorCodes[parseStatus.getMajorCode()]
QueueFeederFetcherStatusfiltered

FetcherStatusAboveExceptionThresholdInQueue
ResolverThread






UpdateHostDbchecked_hosts
UpdateHostDbexisting_known_host
UpdateHostDbexisting_unknown_host
UpdateHostDbnew_known_host
UpdateHostDbnew_unknown_host
UpdateHostDbpurged_unknown_host
UpdateHostDbrediscovered_host
UpdateHostDbLong.toString(datum.numFailures()) + "_times_failed"
SitemapProcessorSitemapexisting_sitemap_entries

Sitemapfailed_fetches

Sitemapfiltered_records

Sitemapfiltered_sitemaps_from_hostname

Sitemapnew_sitemap_entries

Sitemapsitemaps_from_hostname

Sitemapsitemap_seeds
UpdateHostDbMapperUpdateHostDbfiltered_records
UpdateHostDbReducerUpdateHostDbtotal_hosts

UpdateHostDbskipped_not_eligible
WebGraphWebGraph.outlinksadded links

WebGraph.outlinksremoved links
WARCExporterWARCExporterexception

WARCExporterinvalid URI

WARCExportermissing content

WARCExportermissing metadata

WARCExporteromitted empty response

WARCExporterrecords generated

...



./src/java/org/apache/nutch/indexer/CleaningJob.java

...