Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NameTypeDefaultImportanceDocumentation
transforms.<name>.field.style STRING plain HIGH

Permitted values: plain , nested. Defines how to traverse a record structure to apply a transformation. If set to "plain", then the transformations will only apply to the elements located at the root of the message. If set to "nested", then nested elements (accessed by "field.separator") will be affected by the transformations as well.

transforms.<name>.field.separator STRING . LOW 

Permitted values: "." (dot), "/" (slash). When defining the path to a field, this separator determines this path is divided into parent and child elements. If set to ".", then a path "parent.complex.element" will access the parent "parent" struct/map, then the "complex" struct/map, to apply the transformation to the "element". If the default value collides with the element names used in the record, then it can be changed to one of the other 3 alternative values.

To access nested elements, dotted notation is used. If dots are already included in the field name, then dots itself can be used to represent dots part of the field name. e.g. to access elements from a struct/map named "same.field", the following format can be used to access its elements: "same..field.element"


Example:

In this Cast transform, to cast access an element "country" inside a struct/map named "address.personal"Example:

Code Block
{   
  "transforms": "cast",
  "transforms.cast.field.style": "nested",
  "transforms.cast.type": "..."
  "transforms.cast.spec": "address..personal.country:string",
}


SMTs affected

Extending the support for field configuration for dotted separation:

...

Using dots tends to be the most intuitive way to access the nested record structures, e.g. jq tooling already uses it https://stedolan.github.io/jq/manual/#Basicfilters [1] and will cover most of the scenarios.

Dots are already allowed as part of element names on JSON (i.e. Schemaless) records(e.g. {'nested.key': {'val':42}}). Instead of escaping them with backslashes, which in JSON configurations will lead to unfriendly configurations, it's proposed to offer a configuration to switch to another separator.
If users recognize that their field names include dots or other separators, they could define another one to simplify their configurationfollow a similar approach as CSV to escape double quotes by preceding it with the same character (double quotes in this case).

Then, for transform configuration, double dots can be used to escape existing dots that are part of the field name.


[1] https://stedolan.github.io/jq/manual/#Basicfilters

[2] https://datatracker.ietf.org/doc/html/rfc4180 2.7

> If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

Compatibility, Deprecation, and Migration Plan

...

However, backslashes are also used by JSON. This could lead unfriendly configurations like "this\\\\.is\\\\.not\\\\.very\\\\.readable"

Use

...

custom separators for edge cases

Using double dots to escape separators is another alternative to try sticking to using only dots as a field separator.

...

With double dotsWith separator


Code Block
{
  "transforms": "cast",
  "transforms.cast.field.style": "nested",         
  "transforms.cast.type": "..."
  "transforms.cast.spec": "address..personal.country:string"
}



Code Block
{   
  "transforms": "cast",
  "transforms.cast.field.style": "nested",
  "transforms.cast.field.separator": "/", 
  "transforms.cast.type": "..."
  "transforms.cast.spec": "address.personal/country:string",
}


Even though changing the separator represents yet another property to configure, it will be used in a minority of cases, and it could be easier to understand compared to escaping if using custom separators represent a more explicit configuration, there is always the possibility that all the separators are already included as part of the field name, leading to issues and request for changes.

To avoid this, this KIP is proposing to use the approach to precede dots with another do escape itself by repeating dots.It also represents an approach that is similar to the "delimiter" in Flatten SMT, which could make it more familiar for Connect users.

Potential KIPs

Future KIPs could extend this support for:

...