Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: " Under Discussion"

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Therefore, this KIP is aimed to include support for nested structures on the existing SMTs  where this make makes sense , and to include the abstractions to reuse this in future SMTs.

...

From the existing list of SMTs, there are the following to be impacted by this change:

New configuration flags

Add a new configuration flag: field.style  to enable nested and potentially other styles to iterate schemas and apply.

Accepted values:

  • plain (default): SMTs will access fields as it is today, with no lookup for nested fields.
  • nested : if the dotted notation is used, then SMTs will look up for nested fields.

SMTs affected

Extending the support on for field configuration for dotted separation:

...

Will require additional configurations:

  • HoistField: add a source  hoisted config to point to a specific path to hoist.
    • For example: 

    • Code Block
         hoisted = 
      source=
      nested.val
       and field=line will transform: nested: { val: 42 } into nested: { line: { val: 42 } }
  • Flatten: add a field config to point to a specific struct to flat.
    • For example: field=content/name will transform: { content: { id: 42, name: { first: jorge } } } into { content: { id: 42, name.first: jorge } }
    • Switch the delimiter defaults to _ and warn when dots are used as separators as it may clash on the chain of transformers with nested field names.
    • 
         field = line
      
         value (before):
         {
           "nested": {
             "val": 42,
             "other val": 96
           }
         }
      
         value (after):
         {
           "nested": {
             "line": {
               "val": 42,
             },
           "other val": 96
           }
         } 



These SMTs do not require nested structure support:

  • Drop: Drop the whole key or value.
  • Filter: Drops the whole message based on a predicate.
  • InsertHeader: Insert a specific message to the header.
  • RegerRouterRegexRouter: Acts on the topic name.
  • SetSchemaMetadata: Acts on root schema.
  • TimestampRouter: Acts on timestamp.
  • Flatten: Acts on the whole key or message. 


Proposed Changes

Nested notation

Dotted notation nested.key tends to be the most natural way to describe nested fields as part of the configuration. 

Though, schemaless (Map<String, Object) records can have a dotted notation included on their field names (e.g. { 'nested.key': { 'val':42 } }).

As the scenarios where the dotted notation is used on JSON message messages could be rare, this KIP proposed to stick with dots as separators.

For scenarios where dotted notations are present on JSON messages, an escape backlash approach is proposed:

  • "this.field" (which would refer to the nested field "field" under  the top-level "this" field)

  • "this\.field" (which would refer to the field named "this.field")

Compatibility, Deprecation, and Migration Plan

...

If further requests to support other value values arrive, we should consider extending the configuration with a nested delimiter that should be restricted to a set of few values.

Renaming fields SMT could also be used as a workaround to replace dot-named fields on JSON messages.

Rejected Alternatives

...

Keep ExtractField as it is and repeat it until reaching nested fields

This KIP proposes to simplify this configuration by replacing multiple invocations with one.


Potential KIPs

Future KIPs could extend this support for:

  • Recursive notation: name a field and apply it to all fields across the schema matching that name.
  • Access to arrays: Adding []  notation to represent access to arrays and applying SMTs to fields within an array.