Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tool/ObjectMetric GroupMetric NameDescription
CrawlDbFilterCrawlDB filterGone records removed

CrawlDB filterOrphan records removed

CrawlDB filterURLs filtered
CrawlDbReducerCrawlDB status

CrawlDatum.getStatusName(CrawlDatum().getStatus())


DeduplicationJobDeduplicationJobStatusDocuments marked as duplicate
DomainStatistics
MyCounter.EMPTY_RESULT


MyCounter.FETCHED


MyCounter.NOT_FETCHED

Fetcher
FetcherStatusbytes_downloaded
FetcherStatushitByThrougputThreshold
FetcherStatushitByTimeLimit








FetcherThread
FetcherStatusAboveExceptionThresholdInQueue
FetcherStatusFetchItem.notCreated.redirect
FetcherStatusoutlinks_detected
FetcherStatusoutlinks_following
FetcherStatusProtocolStatus.getName()
FetcherStatusredirect_count_exceeded
FetcherStatusredirect_deduplicated
FetcherStatusrobots_denied
FetcherStatusrobots_denied_maxcrawldelay
ParserStatusParseStatus.majorCodes[p.getData().getStatus().getMajorCode()]























Generator








GeneratorEXPR_REJECTED

GeneratorHOSTS_AFFECTED_PER_HOST_OVERFLOW

GeneratorINTERVAL_REJECTED

GeneratorMALFORMED_URL

GeneratorSCHEDULE_REJECTED

GeneratorSCORE_TOO_LOW

GeneratorSTATUS_REJECTED

GeneratorURLS_SKIPPED_PER_HOST_OVERFLOW
IndexerMapReduceIndexerStatusdeleted (duplicates)


deleted (IndexingFilter)


deleted (gone)


deleted (redirects)


deleted (robots=noindex)


errors (IndexingFilter)


errors (ScoringFilter)


indexed (add/update)


skipped (IndexingFilter)


skipped (not modified)




Injector
injectorurls_filtered

injectorurls_injected

injectorurls_merged

injectorurls_purged_404

injectorurls_purged_filter
ParseSegmentParserStatusParseStatus.majorCodes[parseStatus.getMajorCode()]
QueueFeederFetcherStatusfiltered

FetcherStatusAboveExceptionThresholdInQueue
ResolverThreadUpdateHostDbchecked_hosts

UpdateHostDbexisting_known_host

UpdateHostDbexisting_unknown_host

UpdateHostDbnew_known_host

UpdateHostDbnew_unknown_host

UpdateHostDbpurged_unknown_host

UpdateHostDbrediscovered_host

UpdateHostDbLong.toString(datum.numFailures()) + "_times_failed"
SitemapProcessorSitemapexisting_sitemap_entries

Sitemapfailed_fetches

Sitemapfiltered_records

Sitemapfiltered_sitemaps_from_hostname

Sitemapnew_sitemap_entries

Sitemapsitemaps_from_hostname

Sitemapsitemap_seeds
UpdateHostDbMapperUpdateHostDbfiltered_records
UpdateHostDbReducerUpdateHostDbtotal_hosts

UpdateHostDbskipped_not_eligible
WebGraphWebGraph.outlinksadded links

WebGraph.outlinksremoved links
WARCExporterWARCExporterexception

WARCExporterinvalid URI

WARCExportermissing content

WARCExportermissing metadata

WARCExporteromitted empty response

WARCExporterrecords generated

...



./src/java/org/apache/nutch/indexer/IndexerMapReduce.java

...