...
Metrics are important because they tell you important vital information about any given Nutch (and subsequently MapReduce) process. They provide accurate measurements about how the process is functioning and provide basis to suggest improvements in your administration and operations endeavors.
Metrics provide a data-driven mechanism for intelligence gathering within Nutch operations and administration.
Audience
The page is intended for
...
Building Metrics on MapReduce Task Context
As Nutch is a native MapReduce application, the Mapper and Reducer functions of each NutchTool implementation i.e. CommonCrawlDataDumper
, CrawlDb
, DeduplicationJob
, Fetcher
, Generator
, IndexingJob
, Injector
, LinkDb
, ParseSegment
utilize MapContext's andReduceContext's.
This is relevant because these Context's inherit certain methods from the interface org.apache.hadoop.mapreduce.TaskAttemptContext
, specifically implementations of are the entry point for . These contexts are passed to the Mapper and Reducer initially during setup but also used throughout the task lifecycle.
Info | ||
---|---|---|
| ||
The canonical Hadoop documentation for Mapper and Reducer provides much more detail about the involvement of Context's in each task lifecycle. |
For