...
Sqoop schema is mandated, since we need a schema to construct a avro record
Code Block // convert the sqoop schema to avro schema public AvroIntermediateDataFormat(org.apache.sqoop.schema.Schema schema) { super.setSchema(schema); }
- Implement a method to convert csv text to avro GenericRecord
private GenericRecord toAvro(String csv) {..}
- Implement a method to convert the object array to avro GenericRecord
private Object[] toObject(GenericRecord data) { ..}
- Conversely, implement a method to lazily construct the csv from avro GenericRecord when invoked
private String toCSV(GenericRecord record) { ..}
- implement a method to lazily construct the object arrat from avro GenericRecord when invoked
private Object[] toObject(GenericRecord data) {...}
Implement methods to ser/ deser the avro record into a string - wire format
Code Block /** * {@inheritDoc} */ @Override public void write(DataOutput out) throws IOException { // todo } /** * {@inheritDoc} */ @Override public void read(DataInput in) throws IOException { // todo }
Mappings from sqoop to avro types.
Column Type | Object Format | Avro Format / Feld Type |
---|---|---|
NULL value in the field | java null | UNION for any field that is nullable Schema.Type.NULL |
ARRAY | java Object[] | Schema.Type.ARRAY |
BINARY | java byte[] | Schema.Type.BYTES |
BIT | java boolean | Schema.Type.BOOLEAN |
DATE | org.joda.time.LocalDate | Schema.Type.LONG |
DATE_TIME | org.joda.time. DateTime or org.joda.time. LocalDateTime (depends on timezone attribute ) | Schema.Type.LONG |
DECIMAL | java BigDecimal | Schema.Type.FIXED ??? |
ENUM | java String | Schema.Type.ENUM |
FIXED_POINT | java Integer or java Long ( depends on byteSize attribute) | if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= Integer.SIZE) { return Schema.Type.INT; } else { return Schema.Type.LONG; } |
FLOATING_POINT | java Double or java Float ( depends on byteSize attribute) | if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() <= Float.SIZE) { return Schema.Type.FLOAT; } else { return Schema.Type.DOUBLE; } |
MAP | java.util.Map<Object, Object> | Schema.Type.MAP |
SET | java Object[] | Schema.Type.ARRAY |
TEXT | java String | Schema.Type.STRING |
TIME | org.joda.time.LocalTime ( No Timezone) | Schema.Type.LONG |
UNKNOWN | same as java byte[] | Schema.Type.BYTES |
External Jar Dependencies added?
...