Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

SqoopInputFormat

InputFormat defines: How these  how the data in FROM are split up and read. Provides a factory for RecordReader objects that read the file

...

The (key, value) pairs provided by the mapper are passed on the Loader for the TO part. The way they are written is governed by the OutputFormat. SqoopNullOutputFormat extends the OutputFormat class. The real goal of this hadoops hadoop's NullOutputFormat : generates no output files  on HDFS since HDFS may not always be the destination. In our case too HDFS is not always the destination, so we use SqoopNullOutputFormata custom class to to delegate writing to the Loader specified in the sqoop job, it relies on the SqoopOutputFormatLoadExecutor to pass the data to the Loader via the SqoopRecordWriter. Much like how the SqoopInputFormat actually reads individual records through the SqoopRecordReader implementation, the SqoopNullOutputFormat class is a factory for SqoopRecordWriter objects; these are used to write the individual records to the final destination ( in our case the TO part of the sqoop job). Notice the key to the SqoopNullOutputFormat is actually the 

...