THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Tool/Object | Metric Group | Metric Name | Description |
---|---|---|---|
CleaningJob | CleaningJobStatus | Deleted documents | |
CrawlDbFilter | CrawlDB filter | Gone records removed | |
CrawlDB filter | Orphan records removed | ||
CrawlDB filter | URLs filtered | ||
CrawlDbReducer | CrawlDB status | CrawlDatum.getStatusName(CrawlDatum().getStatus()) | |
DeduplicationJob | DeduplicationJobStatus | Documents marked as duplicate | |
DomainStatistics | MyCounter.EMPTY_RESULT | ||
MyCounter.FETCHED | |||
MyCounter.NOT_FETCHED | |||
Fetcher | FetcherStatus | bytes_downloaded | |
FetcherStatus | hitByThrougputThreshold | ||
FetcherStatus | hitByTimeLimit | ||
FetcherThread | FetcherStatus | AboveExceptionThresholdInQueue | |
FetcherStatus | FetchItem.notCreated.redirect | ||
FetcherStatus | outlinks_detected | ||
FetcherStatus | outlinks_following | ||
FetcherStatus | ProtocolStatus.getName() | ||
FetcherStatus | redirect_count_exceeded | ||
FetcherStatus | redirect_deduplicated | ||
FetcherStatus | robots_denied | ||
FetcherStatus | robots_denied_maxcrawldelay | ||
ParserStatus | ParseStatus.majorCodes[p.getData().getStatus().getMajorCode()] | ||
Generator | Generator | EXPR_REJECTED | |
Generator | HOSTS_AFFECTED_PER_HOST_OVERFLOW | ||
Generator | INTERVAL_REJECTED | ||
Generator | MALFORMED_URL | ||
Generator | SCHEDULE_REJECTED | ||
Generator | SCORE_TOO_LOW | ||
Generator | STATUS_REJECTED | ||
Generator | URLS_SKIPPED_PER_HOST_OVERFLOW | ||
IndexerMapReduce | IndexerStatus | deleted (duplicates) | |
deleted (IndexingFilter) | |||
deleted (gone) | |||
deleted (redirects) | |||
deleted (robots=noindex) | |||
errors (IndexingFilter) | |||
errors (ScoringFilter) | |||
indexed (add/update) | |||
skipped (IndexingFilter) | |||
skipped (not modified) | |||
Injector | injector | urls_filtered | |
injector | urls_injected | ||
injector | urls_merged | ||
injector | urls_purged_404 | ||
injector | urls_purged_filter | ||
ParseSegment | ParserStatus | ParseStatus.majorCodes[parseStatus.getMajorCode()] | |
QueueFeeder | FetcherStatus | filtered | |
(also QueueFeeder) | FetcherStatus | AboveExceptionThresholdInQueue | |
ResolverThread | UpdateHostDb | checked_hosts | |
UpdateHostDb | existing_known_host | ||
UpdateHostDb | existing_unknown_host | ||
UpdateHostDb | new_known_host | ||
UpdateHostDb | new_unknown_host | ||
UpdateHostDb | purged_unknown_host | ||
UpdateHostDb | rediscovered_host | ||
UpdateHostDb | Long.toString(datum.numFailures()) + "_times_failed" | ||
SitemapProcessor | Sitemap | existing_sitemap_entries | |
Sitemap | failed_fetches | ||
Sitemap | filtered_records | ||
Sitemap | filtered_sitemaps_from_hostname | ||
Sitemap | new_sitemap_entries | ||
Sitemap | sitemaps_from_hostname | ||
Sitemap | sitemap_seeds | ||
UpdateHostDbMapper | UpdateHostDb | filtered_records | |
UpdateHostDbReducer | UpdateHostDb | total_hosts | |
(also UpdateHostDbReducer) | UpdateHostDb | skipped_not_eligible | |
WebGraph | WebGraph.outlinks | added links | |
(also WebGraph) | WebGraph.outlinks | removed links | |
WARCExporter | WARCExporter | exception | |
WARCExporter | invalid URI | ||
WARCExporter | missing content | ||
WARCExporter | missing metadata | ||
WARCExporter | omitted empty response | ||
WARCExporter | records generated |
./src/java/org/apache/nutch/indexer/CleaningJob.java