You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Status

Current stateUnder Discussion

Discussion thread: TODO (link me)

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Most JSON data that utilizes precise decimal data represents it as a decimal string. Connect, on the other hand, only supports a binary HEX string encoding (see example below). This KIP intends to support all three of the below data types so that it can better integrate with legacy systems (and make the internal topic data easier to read/debug):

{
  "asHex": "D3J5",
  "asString": "10.12345"
  "asNumber": 10.2345
}

Public Interfaces

json.decimal.serialization.format

This configuration will be supported in the JsonConverter and will be used to determine the serialization format of decimals. As of this change, only BINARY, TEXT and NUMERIC values will be supported. The default value will be BINARY to maintain backwards compatibility. 

json.decimal.deserialization.text.format

This configuration will be supported in the JsonConverter and will be used to disambiguate between base64 encoded binary and textual representations of decimal values. As of this change, BINARY (default) and TEXT will both be supported (numeric values will be automatically deserialized and will not be affected by this configuration).

Proposed Changes

JsonConverter will be configurable with the new values. If the values are present, then it will attempt to serialize and deserialize the input values based on the configuration values listed above respectively. 

Compatibility, Deprecation, and Migration Plan

This change is backwards compatible, and no functionality will be deprecated. Users must be careful when enabling the new serialization functionality to ensure that all downstream data consumers can read data serialized in the new format. Rolling upgrades from BINARY to TEXT will require five steps, and will be impossible in some scenarios (e.g. infinite retention topics):

  1. Upgrade all consumers to the new code, keeping the BINARY deserialization option
  2. Upgrade all producers to the new code, and use NUMERIC as the serialization option (consumers will be able to automatically deserialize numeric values)
  3. Wait for retention period on the topic to pass
  4. Change the consumer to use TEXT to deserialize strings
  5. Change the producer to use TEXT to serialize strings

Rejected Alternatives

  • Encoding the serialization in the schema for Decimal LogicalType. This is good because it means that the deserializer will be able to decode based on the schema and one converter can handle different topics encoded differently as long as the schema is in line. The problem is that this is specific to only JSON and changing the LogicalType is not the right place.
  • Automatically detecting the serialization format. While it is possible to automatically differentiate NUMERIC from TEXT and BINARY, it is not always possible to differentiate between TEXT from BINARY. Take, for example, the string "12" - this is both a valid decimal (12) and a valid hex string which represents a decimal (1.8).
  • No labels