Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateUnder Discussion

Discussion thread: TODO (link me) KIP-481 Discussion

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Most JSON data that utilizes precise decimal data represents it as a decimal stringnumber. Connect, on the other hand, only supports a binary HEX string encoding (see example below). This KIP intends to support all three of the below data types both representations so that it can better integrate with legacy systems (and make the internal topic data easier to read/debug):

Code Block
{
  "asHex": "D3J5",
  "asString": "10.12345"
  "asNumber": 10.2345
}

Public Interfaces

A new configuration for producers json.decimal.serialization.

...

This configuration fromat will be supported in introduced to the JsonConverter and will be used to determine the serialization format of decimals. As of this change, only BINARY, TEXT and NUMERIC values will be supported. The default value will be BINARY to maintain backwards compatibility. 

json.decimal.deserialization.text.format

This configuration will be supported in the JsonConverter and will be used to disambiguate between base64 encoded binary and textual representations of decimal values. As of this change, BINARY (default) and TEXT will both be supported (numeric values will be automatically deserialized and will not be affected by this configuration).

Proposed Changes

configuration to help control whether to produce in numeric or binary formats. The valid values will be "BINARY" (default, to maintain compatibility) and "NUMERIC".

Proposed Changes

The changes will be scoped nearly entirely to the JsonConverter, which will be able to deserialize a NumericNode when the schema is defined as a decimal. Namely, the converter will no longer throw an exception if the incoming data is a numeric node but the schema is specified decimal (logical type). 

If json.decimal.serialization.format is set to BINARY, the serialization path will remain the same. If it is set to NUMERIC, the JSON value being produced will be a number instead of a text value.JsonConverter will be configurable with the new values. If the values are present, then it will attempt to serialize and deserialize the input values based on the configuration values listed above respectively. 

Compatibility, Deprecation, and Migration Plan

This change is backwards compatible, and no functionality will be deprecated. Users must be careful when enabling the new serialization functionality to ensure that all downstream data consumers can read data serialized in the new format. Rolling upgrades from BINARY to TEXT will require five steps, and will be impossible in some scenarios (e.g. infinite retention topics):

  1. Upgrade all consumers to the new code, keeping the BINARY deserialization option
  2. Upgrade all producers to the new code, and use NUMERIC as the serialization option (consumers will be able to automatically deserialize numeric values)
  3. Wait for retention period on the topic to pass
  4. Change the consumer to use TEXT to deserialize strings
  5. Change the producer to use TEXT to serialize strings

...

There are the following combinations that could occur during migration:

  • Legacy Producer, Upgraded Consumer: this scenario is okay, as the upgraded consumer will be able to read the implicit BINARY format
  • Upgraded Producer with NUMERIC serialization, Upgraded Consumer: this scenario is okay, as the upgraded consumer will be able to read the numeric serialization
  • Upgraded Producer with BINARY serialization, Legacy Consumer: this scenario is okay as the upgraded producer will be able to read binary as today
  • Upgraded Producer with NUMERIC serialization, Legacy Consumerthis is the only scenario that is not okay and will cause issues since the legacy consumers cannot consumer NUMERIC data. 

Because of this, users must take care to first ensure that all consumers have upgraded to the new code before upgrading producers to make use of the NUMERIC serialization format.

There is also concern of data changing in the middle of the stream:

  • Legacy → Upgraded BINARY: this will not cause any change in the data in the topic
  • Legacy → Upgraded NUMERIC: this will cause a all new values to be serialized using NUMERIC format and will cause issues unless consumers are upgraded
  • Upgraded BINARY → Upgraded NUMERIC: this is identical to above
  • Upgraded NUMERIC → Upgraded BINARY: this will not cause a new issue since if the numeric format was already working, all consumers would be able to read binary format as well
  • Upgraded NUMERIC → (Rollback) Legacy: this is identical to above

Rejected Alternatives

  • The original KIP suggested supporting an additional representation - base10 encoded text (e.g. `{"asText":"10.2345"}`). This causes issues because it is impossible to disambiguate between TEXT and BINARY without an additional config - furthermore, this makes the migration from one to the other nearly impossible because it would require that all consumers stop consuming and producers stop producing and atomically updating the config on all of them after deploying the new code, or waiting for the full retention period to pass - neither option is viable. The suggestion in the KIP is strictly an improvement over the existing behavior, even if it doesn't support all combinations.
  • Encoding the serialization in the schema for Decimal LogicalType. This is good because it means that the deserializer will be able to decode based on the schema and one converter can handle different topics encoded differently as long as the schema is in line. The problem is that this is specific to only JSON and changing the LogicalType is not the right place.
  • Automatically detecting the serialization format. While it is possible to automatically differentiate NUMERIC from TEXT and BINARY, it is not always possible to differentiate between TEXT from BINARY. Take, for example, the string "12" - this is both a valid decimal (12) and a valid hex string which represents a decimal (1.8).