...
At this point Pig actually launches a MapReduce job on the cluster.
On the cluster, HCatBaseInputFormat.createRecordReader
is called with an HCatSplit
, the wrapper we created earlier that contains an actual input split, and the partition information needed to deserialize its records. An HCatRecordReader
that contains a storage handler is returned to the framework; the storage handler contains information necessary to read data from the underlying storage and convert them into useable records.
With the RecordReader
initialized, its time to get some actual records! Pig calls HCatBaseLoader.getNext
which gets an HCatRecord
from the HCatRecordReader
we just initialized, converts to a Pig tuple, and hands off to Pig for processing.