Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are three proposed changes:

  • JsonConverter will accept the Define a new decimal.format configuration property on JsonConverter to determine specify the serialization format . If the value is BASE64, the behavior remains unchanged (i.e. it serializes it as a base64 text). If the value is NUMERIC, the JSON node will be for Connect DECIMAL logical type values with two allowed literals as the configuration property:
    • (default) BASE64 specifies the existing behavior of serializing DECIMAL logical types as base64 encoded binary data (e.g. "D3J5" in the example above); and
    • NUMERIC will serialize Connect DECIMAL logical type values in JSON as a number representing that decimal (e.g. 10.2345
    instead of "D3J5").JsonConverter will automatically handle deserialization of either serialization format given a Decimal logical type schema, i.e. it will accept both a deserialized BinaryNode and NumericNode. If the value is a BinaryNode, it will construct a java BigDecimal from the binaryValue() (which is a btye[]). If the value is a NumericNode, it will simply pass through the decimalValue() deserialized by the JsonDeserializer
    • in the example above)
  • The JsonConverter deserialization method currently expects only a BinaryNode, but will be changed to also handle NumericNode by calling NumericNode.decimalValue().
  • JsonDeserializer will now default floating point deserialization to BigDecimal to avoid losing precision. This may impact performance when deserializing doubles - a JMH microbenchmark on my local MBP, this estimated about 3x degradation for deserializing JSON floating points. If the connect schema is not the decimal logical type, the JsonConverter will convert this BigDecimal value into the corresponding floating point java object.
  • Configure the JsonConverter for internal topics with `decimal.format=NUMERIC` so that if the DECIMAL types will be serialized in a more natural representation. This is safe since connect internal topics do not currently use any decimal types.

Compatibility, Deprecation, and Migration Plan

...

  1. Upgrade all sink converters to version "C" or higher and restart the sink converters. No configuration change is needed for the sink converters.
  2. Upgrade the source converters to version "C" or higher.
  3. Set the decimal.format configuration either in the top-level worker config hosting the source connector or the individual connector config that will leverage this functionality.
  4. If the Connect worker uses the JsonConverter for the key and/or value converters, optionally set the `decimal.format=NUMERIC` for the key and/or value converter and restart the workers.
  5. If desired, update any source connector configs that use the JsonConverter for key and/or value converters to use `decimal.format=NUMERIC`Restart all workers hosting the source connector to pick up the new code and configuration changes.

Rejected Alternatives

  • The original KIP suggested supporting an additional representation - base10 encoded text (e.g. `{"asText":"10.2345"}`). While it is possible to automatically differentiate NUMERIC from BASE10 and BASE64, it is not always possible to differentiate between BASE10 from BASE64. Take, for example, the string "12" - this is both a valid decimal (12) and a valid hex string which represents a decimal (1.8). This causes issues because it is impossible to disambiguate between BASE10 and BASE64 without an additional config - furthermore, this makes the migration from one to the other nearly impossible because it would require that all consumers stop consuming and producers stop producing and atomically updating the config on all of them after deploying the new code, or waiting for the full retention period to pass - neither option is viable. The suggestion in the KIP is strictly an improvement over the existing behavior, even if it doesn't support all combinations.
  • Encoding the serialization in the schema for Decimal LogicalType. This is good because it means that the deserializer will be able to decode based on the schema and one converter can handle different topics encoded differently as long as the schema is in line. The problem is that this is specific only to JSON and changing the LogicalType is not the right place.
  • Creating a new JsonConverter (e.g. JsonConverterV2) that handles decimals using numeric values. This would cause undue code maintenance burden and also may cause confusion when configuring connectors (e.g. what's the difference between the two converters and why are there two?) - naming them properly is a challenge.