Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder DiscussionAccepted

Discussion thread: KIP-481 Discussion

...

Introduce a new configuration to the JsonConverter named decimal.format to control whether source converters will serialize decimals in numeric or binary formats. The value will be case insensitive and can be either "BASE64" (default, to maintain compatibility) or "NUMERIC".

Proposed Changes

There are three proposed We propose the following changes:

  1. Define a new decimal.format configuration property on JsonConverter to specify the serialization format for Connect DECIMAL logical type values with two allowed literals as the configuration property:
    • (default) BASE64 specifies the existing behavior of serializing DECIMAL logical types as base64 encoded binary data (e.g. "D3J5" in the example above); and
    • NUMERIC will serialize Connect DECIMAL logical type values in JSON as a number representing that decimal (e.g. 10.2345 in the example above)
  2. The JsonConverter deserialization method currently expects only a BinaryNode, but will be changed to also handle NumericNode by calling NumericNode.decimalValue().
  3. JsonDeserializer will now default floating point deserialization to BigDecimal to avoid losing precision. This may impact performance when deserializing doubles - a JMH microbenchmark on my local MBP, this estimated about 3x degradation for deserializing JSON floating points. If the connect schema is not the decimal logical type, the JsonConverter will convert this BigDecimal value into the corresponding floating point java object.
  4. Configure the JsonConverter for internal topics with `decimal.format=NUMERIC` so that if the DECIMAL types will be serialized in a more natural representation. This is safe since connect internal topics do not currently use any decimal types.

To understand behavior of this configuration with and without schemas, refer to the table below.

Source Schema (Info)JsonConverter BehaviorJsonDeserializer Behavior

Schemas EnabledSchemas Disabled
DECIMAL (BASE64)returns DECIMAL logical typethrows DataExceptionstores BinaryNode (byte[] data)
DECIMAL (NUMERIC)returns DECIMAL logical typereturns FLOAT64 (lossy)*stores NumericNode (BigDecimal data)
FLOAT64 (Legacy)returns FLOAT64returns FLOAT64stores NumericNode (Double data)
FLOAT64 (KIP-481)returns FLOAT64returns FLOAT64stores NumericNode (BigDecimal data)**

* previously it was impossible for a sink converter to read decimal data encoded in BASE64 (a DataException was raised when decoding BINARY data without a schema) - after KIP-481, it is possible that a decimal data produced by a source converter with NUMERIC encoding will be deserialized as a float by a sink converter without schemas enabled. The data being read here will be lossy. This is okay because all existing applications will function exactly the same.

** with this behavior, users will be provided with a different NumberNode than previously (DecimalNode vs. DoubleNode). Since Jackson requires calling a strongly typed method (e.g. floatValue vs. decimalValue) to extract data from a NumericNode, applications relying on JsonDeserializer will not be affected by this change

Compatibility, Deprecation, and Migration Plan

  • The JsonConverter's serialization behavior is identical to legacy behavior when using the default configuration (decimal.format=BASE64)
  • The JsonConverter's deserialization behavior is backwards compatible and can handle data encoded from earlier versionsUpgrading source and sink converters to the new code version in any combination is backwards compatible if decimal.format is left unset or is set to "BASE64" in the source converters.
  • To set decimal.format to "NUMERIC" in source converters, all sink converters reading the data (and any other downstream consumers of the source converter output topic) must first upgrade code to include the code change from this KIP.
  • Note that the Kafka topic will have messages of mixed serialization format after this change. Rolling back sink converters or any other downstream Kafka consumers to code that cannot handle both formats will be unsafe after upgrading source converters to use NUMERIC serialization formats.

...