THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Info | ||
---|---|---|
| ||
The table is arranged
|
Tool/Object | Metric Group | Metric Name | Description | Usage and Comments |
---|---|---|---|---|
CleaningJob | CleaningJobStatus | Deleted documents | The total count of DB_GONE and/or DB_DUPLICATE documents ultimately cleaned (deleted) from the indexer(s). | This metric is useful for determining whether filtering or duplicate detection needs to happen further upstream prior to indexing. Ideally DB_GONE and DB_DUPLICATE documents would not make it into production indices in the first place. |
CrawlDbFilter | CrawlDB filter | Gone records removed | ||
CrawlDB filter | Orphan records removed | |||
CrawlDB filter | URLs filtered | |||
CrawlDbReducer | CrawlDB status | CrawlDatum.getStatusName(CrawlDatum().getStatus()) | ||
DeduplicationJob | DeduplicationJobStatus | Documents marked as duplicate | ||
DomainStatistics | N/A | MyCounter.EMPTY_RESULT | ||
N/A | MyCounter.FETCHED | |||
N/A | MyCounter.NOT_FETCHED | |||
Fetcher | FetcherStatus | bytes_downloaded | ||
FetcherStatus | hitByThrougputThreshold | |||
FetcherStatus | hitByTimeLimit | |||
FetcherThread | FetcherStatus | AboveExceptionThresholdInQueue | ||
FetcherStatus | FetchItem.notCreated.redirect | |||
FetcherStatus | outlinks_detected | |||
FetcherStatus | outlinks_following | |||
FetcherStatus | ProtocolStatus.getName() | |||
FetcherStatus | redirect_count_exceeded | |||
FetcherStatus | redirect_deduplicated | |||
FetcherStatus | robots_denied | |||
FetcherStatus | robots_denied_maxcrawldelay | |||
ParserStatus | ParseStatus.majorCodes[p.getData().getStatus().getMajorCode()] | |||
Generator | Generator | EXPR_REJECTED | ||
Generator | HOSTS_AFFECTED_PER_HOST_OVERFLOW | |||
Generator | INTERVAL_REJECTED | |||
Generator | MALFORMED_URL | |||
Generator | SCHEDULE_REJECTED | |||
Generator | SCORE_TOO_LOW | |||
Generator | STATUS_REJECTED | |||
Generator | URLS_SKIPPED_PER_HOST_OVERFLOW | |||
IndexerMapReduce | IndexerStatus | deleted (duplicates) | ||
IndexerStatus | deleted (IndexingFilter) | |||
IndexerStatus | deleted (gone) | |||
IndexerStatus | deleted (redirects) | |||
IndexerStatus | deleted (robots=noindex) | |||
IndexerStatus | errors (IndexingFilter) | |||
IndexerStatus | errors (ScoringFilter) | |||
IndexerStatus | indexed (add/update) | |||
IndexerStatus | skipped (IndexingFilter) | |||
IndexerStatus | skipped (not modified) | |||
Injector | injector | urls_filtered | ||
injector | urls_injected | |||
injector | urls_merged | |||
injector | urls_purged_404 | |||
injector | urls_purged_filter | |||
ParseSegment | ParserStatus | ParseStatus.majorCodes[parseStatus.getMajorCode()] | ||
QueueFeeder | FetcherStatus | filtered | ||
(also QueueFeeder) | FetcherStatus | AboveExceptionThresholdInQueue | ||
ResolverThread | UpdateHostDb | checked_hosts | ||
UpdateHostDb | existing_known_host | |||
UpdateHostDb | existing_unknown_host | |||
UpdateHostDb | new_known_host | |||
UpdateHostDb | new_unknown_host | |||
UpdateHostDb | purged_unknown_host | |||
UpdateHostDb | rediscovered_host | |||
UpdateHostDb | Long.toString(datum.numFailures()) + "_times_failed" | |||
SitemapProcessor | Sitemap | existing_sitemap_entries | ||
Sitemap | failed_fetches | |||
Sitemap | filtered_records | |||
Sitemap | filtered_sitemaps_from_hostname | |||
Sitemap | new_sitemap_entries | |||
Sitemap | sitemaps_from_hostname | |||
Sitemap | sitemap_seeds | |||
UpdateHostDbMapper | UpdateHostDb | filtered_records | ||
UpdateHostDbReducer | UpdateHostDb | total_hosts | ||
(also UpdateHostDbReducer) | UpdateHostDb | skipped_not_eligible | ||
WebGraph | WebGraph.outlinks | added links | ||
(also WebGraph) | WebGraph.outlinks | removed links | ||
WARCExporter | WARCExporter | exception | ||
WARCExporter | invalid URI | |||
WARCExporter | missing content | |||
WARCExporter | missing metadata | |||
WARCExporter | omitted empty response | |||
WARCExporter | records generated |