Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Connectors have FROM and TO parts. A sqoop job represents data transfer between FROM and TO. IDF API represents how the data is represented as it flows between the FROM and TO via sqoop.  Connectors represent different data sources and each data source can have its custom/ native format that it uses. For instance MongoDb might use JSON as its optimal native format, HDFS can use plain CSV text, S3 can use its own custom format. IDF API provides 3 main ways to represent data. In simple words, every data source has one thing in common, it is collection of rows and each row is a collection of fields / columns. Most if not all data sources have strict schema that tells what each field type is. 

IDF API provides 3 main ways to represent data that flows within sqoop 

  1. Native format - each row in the data source is a native object, for instance in JSONIDF, an entire row and its fields in sqoop will be represented as a JSON object, in AvroIDF, entire row and its fields will be represented as a Avro record
  2. CSV text format - each row and its fields are represented as CSV text
  3. Object Array format  - each field in the row is an element in the object array. Hence a row in the data source is represented as a object array. 

...