Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This KIP is aimed aim to include support for nested structures on the existing SMTs.

...

However, dots are already allowed as part of element names on JSON (i.e. Schemaless) records(e.g. {'nested.key': {'valvalue':42}}). Instead of escaping them dots with backslashes — which in JSON configurations will lead to unfriendly configurations — it's proposed to follow a similar approach as the CSV formatJSONata[2] to escape double-quotes by preceding it with the same character (double quotes in this case).add field names with dots using backticks, e.g. `nested.key`.value

Double-backticks Then, for transform configurations, double-dots can be used to escape existing dots backticks that are part of the field name.

[1] https://stedolan.github.io/jq/manual/#Basicfilters

[2] https://datatrackerdocs.ietfjsonata.org/doc/html/rfc4180 2.7   > If double quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.simple#examples

  > Field references containing whitespace or reserved tokens can be enclosed in backticks

Examples


ScenarioField nameNested path
Normal (no dots or backticks on field names)a.b.c

a:

  b:

    c: val

Field names including dotsa.`b.c`

a:

  b.c: val

Field names including backticksa.b`.c

a:

  b`:

    c: val

Field names including dots and backticksa.`b``.c`

a:

  b`c: val

Affected SMTs

These SMTs will include support for nested structure:

  • Cast
  • ExtractField
  • HeaderFrom
  • MaskField
  • ReplaceField
  • TimestampConverter
  • ValueToKey
  • InsertField
  • HoistField

Non-affected SMTs

These SMTs do not require nested structure support:

  • DropHeaders: Drop one or multiple headers.
  • Filter: Drops the whole message based on a predicate.
  • InsertHeader: Insert a specific message to the header.
  • RegexRouter: Acts on the topic name.
  • SetSchemaMetadata: Acts on root schema.
  • TimestampRouter: Acts on timestamp.
  • Flatten: Acts on the whole key or message. 

Public Interfaces

From the existing list of the SMTs, there are the following to be impacted by this change:

...

These flags will be added conditionally to some SMTs, as described below.

Affected SMTs

Cast

Changes:

  • Extend spec to support nested notation.

...

scenarioinputsmtoutput
1. Nested field.


Code Block
languagejs
{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}



Code Block
languagejs
{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.HoistFIeld$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.hoisted": "parent.child.k2",
"transforms.smt1.field": "other"
}



Code Block
languagejs
{
  "k1": 123,
  "parent": {
    "child": {
      "other": {
        "k2": "123"
      }    
    }
  }
}


2. Nested struct, when field names include dots


Code Block
languagejs
{
  "k1": 123,
  "parent.child": {
    "k2": "123"
  }
}



Code Block
languagejs
{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.HoistFIeld$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.hoisted": "parent..child",
"transforms.smt1.field": "other"
}



Code Block
languagejs
{
  "k1": 123,
  "other": {
    "parent.child": {
      "k2": "123"
    }
  }
}

Non-affected SMTs

These SMT do not require nested structure support:

...



Compatibility, Deprecation, and Migration Plan

...