You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Intermediate representation

In sqoop2 connectors will supply their own map phase that will import data into HDFS. Because this piece of code will be fully under connector maintenance, we need to agree on common intermediate (map output) form for all connectors and all cases. This page goal is to do comparison of different intermediate representation, so that we can pick up the appropriate one for sqoop 2.

Goals 

  • Simple
  • Fast (no necessary parsing, encoding, ...)

Ideas

List of ideas that we've explored.

mysqldump format 

Comma separated list of values present in one single Text instance. Strings and binary values are wrapped with simple quotation. For example:

0,'Hello world','Jarcec\'s notes'

Inside string and binary fields all bytes are printed directly with following exceptions when one byte is encoded into two different bytes:

Byte

Written as

0x00

\0

0x0A

\n

0x0D

\r

0x1A

\Z

0x22

\"

0x27

\'

0x5C

\ \ (no space) 

List<Text> 
avro 
  • No labels