...
Summary
MR Submission Engine
MR Execution Engine
2. Even though we call sqoop as a map only, is that how it always works? what happend when numLoaders is non zero
The current semantics is:
# Extractors | # Loaders | Outcome |
---|---|---|
Default | Default | Map only job with 10 map tasks |
Number X | Default | Map only job with X map tasks |
Number X | Number Y | Map-reduce job with X map tasks and Y reduce tasks |
Default | Number Y | Map-reduce job with 10 map tasks and Y reduce tasks |
The purpose have been to provide ability to user to throttle both number of loader and extractors in an independent way (e.g. have different number of loaders then extractors) and to have default values that won't run reduce phase if not necessary.
SqoopMapper
Sqoop Writable
Having a Writable class is required by Hadoop framework - we are using the current one as a wrapper forIntermediateDataFormat that we can't use directly in MR as Hadoop doesn't support that (to my best knowledge). We're not using a concrete implementation such as Text, so that we don't have to convert all records to String to transfer data between mappers and reducers.
SqoopInputFormat
SqoopNullOutputFormat
...