You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Status

Current state: "Under Discussion"

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Single Message Transforms (SMT), KIP-66, have greatly improved Connector's usability by enabling processing input/output data without the need for additional streaming applications. These benefits have been limited as most SMT implementations are limited to fields available on the root structure:

Therefore, this KIP is aimed to include support for nested structures on the existing SMTs  where this make sense —, and to include the abstractions to reuse this in future SMTs.


Public Interfaces

From the existing list of SMTs, there are the following to be impacted by this change:

Extending the support on field configuration for dotted separation:

  • Cast: extend spec to support nested notation.
  • ExtractField: extend field to support nested notation.
  • HeaderFrom: extend fields list to support nested notation.
  • MaskField: extend fields list to support nested notation.
  • ReplaceField: extend include and exclude lists to support nested notation.
  • TimestampConverter: extend field to support nested notation.
  • ValueToKey: extend fields list to support nested notation.
  • InsertField: Extend field configs to support nested notation.

Will require additional configurations:

  • HoistField: add a source config to point to a specific path to hoist.
    • For example: source=nested.val and field=line will transform: nested: { val: 42 } into nested: { line: { val: 42 } }
  • Flatten: add a field config to point to a specific struct to flat.
    • For example: field=content/name will transform: { content: { id: 42, name: { first: jorge } } } into { content: { id: 42, name.first: jorge } }
    • Switch the delimiter defaults to _ and warn when dots are used as separators as it may clash on the chain of transformers with nested field names.

These SMTs do not require nested structure support:

  • Drop: Drop the whole key or value.
  • Filter: Drops whole message based on predicate.
  • InsertHeader: Insert specific message to header.
  • RegerRouter: Acts on topic name.
  • SetSchemaMetadata: Acts on root schema.
  • TimestampRouter: Acts on timestamp.


Proposed Changes

Nested notation

Dotted notation nested.key tends to be the most natural way to describe nested fields as part of the configuration. Though, schemaless (Map<String, Object) records can have dotted notation included on their field names (e.g. { 'nested.key': { 'val':42 } }).

As the scenarios where dotted notation is used on JSON message could be rare, this KIP proposed to stick with dots as separators.


Compatibility, Deprecation, and Migration Plan

Existing SMT configurations should work fine unless they are using schemaless JSON records relying on dotted notation. This will need to be assessed as part of the KIP discussion.

If further requests to support other value arrive, we should consider extending the configuration with a nested delimiter that should be restricted to a set of few values.

Renaming fields SMT could also be used as a workaround to replace dot-named fields on JSON messages.

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels