Intermediate representation
In sqoop2 connectors will supply their own map phase that will import data into HDFS. Because this piece of code will be fully under connector maintenance, we need to agree on common intermediate (map output) form for all connectors and all cases. This page goal is to do comparison of different intermediate representation, so that we can pick up the appropriate one for sqoop 2.
Goals
- Simple
- Fast (no necessary parsing, encoding, ...)
Ideas
List of ideas that we've explored.
mysqldump format
Comma separated list of values, strings and binary values are wrapped with simple quotation.
For example: 0,'Hello world'
Inside string and binary constants all bytes are present as their normal byte representation with following exceptions:
Byte |
Written as |
---|---|
0x00 |
\0 |
0x0A |
\n |
0x0D |
\r |
0x1A |
\Z |
0x22 |
\" |
0x27 |
\' |
0x5C |
\\\\ |