You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Work In progress

JIRA : SQOOP-1938

This document provides details of how the Sqoop MR Execution Engine works, its major components and details about the internals of the the implementation

Summary

MR Submission Engine

 

MR Execution Engine

 

2. Even though we call sqoop as a map only, is that how it always works? what happend when numLoaders is non zero

The current semantics is:

# Extractors# LoadersOutcome
DefaultDefaultMap only job with 10 map tasks
Number XDefaultMap only job with X map tasks
Number XNumber YMap-reduce job with X map tasks and Y reduce tasks
DefaultNumber YMap-reduce job with 10 map tasks and Y reduce tasks

The purpose have been to provide ability to user to throttle both number of loader and extractors in an independent way (e.g. have different number of loaders then extractors) and to have default values that won't run reduce phase if not necessary.

SqoopMapper

 

Sqoop Writable

Having a Writable class is required by Hadoop framework - we are using the current one as a wrapper forIntermediateDataFormat that we can't use directly in MR as Hadoop doesn't support that (to my best knowledge). We're not using a concrete implementation such as Text, so that we don't have to convert all records to String to transfer data between mappers and reducers.

SqoopInputFormat

 

SqoopNullOutputFormat

 

SqoopReducer

  • No labels