Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Sqoop schema is mandated, since we need a schema to construct a avro record

    Code Block
     // convert the sqoop schema to avro schema
      public AvroIntermediateDataFormat(org.apache.sqoop.schema.Schema schema) {
        super.setSchema(schema);
      } 
  • Implement a method to convert csv text to avro GenericRecord
    •   private GenericRecord toAvro(String csv) {..}

  • Implement a method to convert the object array to avro GenericRecord
    •   private Object[] toObject(GenericRecord data) { ..}

  • Conversely, implement a method to lazily construct the csv from avro GenericRecord when invoked
    •   private String toCSV(GenericRecord record) { ..}

  • implement a method to lazily construct the object arrat from avro GenericRecord when invoked
    • private Object[] toObject(GenericRecord data) {...}

  • Implement methods to ser/ deser the avro record into a string - wire format

    Code Block
    /**
       * {@inheritDoc}
       */
      @Override
      public void write(DataOutput out) throws IOException {
       // todo
      }
      /**
       * {@inheritDoc}
       */
      @Override
      public void read(DataInput in) throws IOException {
        // todo
      }
    


  • Mappings from sqoop to avro types.

 

Column TypeObject FormatAvro Format / Feld Type
NULL value in the fieldjava nullUNION for any field that is nullable

Schema.Type.NULL

ARRAY
java Object[]

Schema.Type.ARRAY

BINARY
java byte[]

Schema.Type.BYTES

BIT

java boolean

Schema.Type.BOOLEAN

DATE
org.joda.time.LocalDate

Schema.Type.LONG

DATE_TIME

org.joda.time. DateTime

or

org.joda.time. LocalDateTime

(depends on timezone attribute )

Schema.Type.LONG

DECIMAL

java BigDecimal

Schema.Type.FIXED ???
ENUM
java String

Schema.Type.ENUM

FIXED_POINT

java Integer

or

java Long

( depends on

byteSize attribute)

if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= Integer.SIZE) {

return Schema.Type.INT;

      } else {

return Schema.Type.LONG;

      }

FLOATING_POINT

java Double

or

java Float

( depends on

byteSize attribute)

if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() <= Float.SIZE) {

return Schema.Type.FLOAT;

      } else {

return Schema.Type.DOUBLE;

      }

MAP
java.util.Map<Object, Object>

Schema.Type.MAP

SET

java Object[]

Schema.Type.ARRAY

TEXT
java String

Schema.Type.STRING

TIME
org.joda.time.LocalTime ( No Timezone)

Schema.Type.LONG

UNKNOWN
same as java byte[]

Schema.Type.BYTES

 

External Jar Dependencies added?

...