...
Name | Type | Default | Importance | Documentation |
---|---|---|---|---|
transforms.<name>.field.style | STRING | plain |
| Permitted values: |
transforms.<name>.field.separator | STRING | . | LOW | Permitted values: " |
To access nested elements, dotted notation is used. If dots are already included in the field name, then dots itself can be used to represent dots part of the field name. e.g. to access elements from a struct/map named "same.field", the following format can be used to access its elements: "same..field.element" |
Example:
In this Cast transform, to cast access an element "country" inside a struct/map named "address.personal"Example:
Code Block |
---|
{ "transforms": "cast", "transforms.cast.field.style": "nested", "transforms.cast.type": "..." "transforms.cast.spec": "address..personal.country:string", } |
SMTs affected
Extending the support for field configuration for dotted separation:
...
Using dots tends to be the most intuitive way to access the nested record structures, e.g. jq
tooling already uses it https://stedolan.github.io/jq/manual/#Basicfilters [1] and will cover most of the scenarios.
Dots are already allowed as part of element names on JSON (i.e. Schemaless) records(e.g. {
'nested.key': {'val':42}}
). Instead of escaping them with backslashes, which in JSON configurations will lead to unfriendly configurations, it's proposed to offer a configuration to switch to another separator.
If users recognize that their field names include dots or other separators, they could define another one to simplify their configurationfollow a similar approach as CSV to escape double quotes by preceding it with the same character (double quotes in this case).
Then, for transform configuration, double dots can be used to escape existing dots that are part of the field name.
[1] https://stedolan.github.io/jq/manual/#Basicfilters
[2] https://datatracker.ietf.org/doc/html/rfc4180 2.7
> If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
Compatibility, Deprecation, and Migration Plan
...
However, backslashes are also used by JSON. This could lead unfriendly configurations like "this\\\\.is\\\\.not\\\\.very\\\\.readable"
Use
...
custom separators for edge cases
Using double dots to escape separators is another alternative to try sticking to using only dots as a field separator.
...
With double dots | With separator | ||||
---|---|---|---|---|---|
|
|
Even though changing the separator represents yet another property to configure, it will be used in a minority of cases, and it could be easier to understand compared to escaping if using custom separators represent a more explicit configuration, there is always the possibility that all the separators are already included as part of the field name, leading to issues and request for changes.
To avoid this, this KIP is proposing to use the approach to precede dots with another do escape itself by repeating dots.It also represents an approach that is similar to the "delimiter" in Flatten SMT, which could make it more familiar for Connect users.
Potential KIPs
Future KIPs could extend this support for:
...