Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add M$ SQL bcp utility description

...

Data type

Serialized as

INT (and all variants)

Direct value (666)

NUMERIC

Direct value (66.60)

REAL(and all variants)

Direct value (66.5999985, 55e55) or string constant for special cases  ('Infinity', '-Infinity', 'NaN')

VARCHAR(text, ...)

String

CHAR

String, unused positions at the end are filled with spaces

TIMESTAMP(date, time, ...)

String in format YYYY-MM-DD HH:MM:SS.ZZZZZZ (Date and hour part)

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="09fb43e05006ebce-77dca93d-4b244678-b524bf10-67aaa94886d742c45e056632"><ac:plain-text-body><![CDATA[

TIMESTAMP with time zone (and others)

String in format YYY-MM-DD HH:MM:SS.ZZZZZZ[+-]XX ('2012-07-03 14:07:11.876239+02') 

]]></ac:plain-text-body></ac:structured-macro>

BOOLEAN

Constants true and false (not quoted as a String)

ENUM

String

ARRAY

String that contains special structure - '{ITEM1, ITEM2, ITEM3}', ITEMX itself might be in separate quotes if needed.

...

Code Block
666, 66.60, 'jarcec', 'Jarcec''s notes', '2012-07-03 14:07:11.876239', true, '{1,2,3}', NULL, '\x4d7953514c'

Microsoft SQL Server's bcp utility

SQL Server bcp utility is by default producing binary output that would be hard to interpret. Fortunately it can be forced to produce character output using command line switch "-c". In this case bcp utility will produce CSV file -- rowdelimiter can be specified by another command line switch "-t" (tab by default), similarly using "-r" switch user can specify row delimiter (new line by default).

Encoding overview of various data types follows:

Data type

Serialized as

INT (and it's variants)

Direct value (666)

NUMERIC (decimal, ...)

Direct value (666.66)

FLOAT(real, ...)

Direct value(33.299999999999997)

DATE

Constant in format YYYY-MM-DD (2012-01-01)

DATETIME

Constant in format YYYY-MM-DD HH:MM:DD.ZZZ (2012-06-06 01:01:01.000)

TIME

Constant in format HH:MM:DD.ZZZZZZ (01:01:01.0000000)

VARCHAR(nvarchar, char,... )

Directly as it without any encoding, empty string is represented as zero byte

NULL

Empty or missing value (for example when using comma as separator - ,,)

NOTE: String columns are not allowed to contain row or column delimiters, which is actually quite tricky.

Encoded example:

Code Block

35,15.20,33.299999999999997,2012-06-06,2012-06-06 01:01:01.000,01:01:01.0000000,jarcec's comment with, comma

AVRO

We can also utilize AVRO for intermediate format. Avro do have support only for limited number of data types, so that we would have to encode some types (for example we might encode Date as a string with same format as above). List of all supported types can be found on following wiki page. Example of mapper code in the Connector itself would be:

...