Table of Contents

Status

Current state: Under DiscussionVoting

Discussion thread: here

JIRA: here

...

> If double quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

Accessing multiple values

...

by deep-scan

There are scenarios where either we want to target multiple fields with the same name at different levels, e.g. arrays or dynamic/unknown structures.

For these cases, an asterisk can be used to search all elements within a path:

a.*.b will access a and then search all child objects /arrays for the field b, including arrays.

If deep-scan is used, it must have only one field after the asterisk level.

Deep scans are expected to return multiple values. The SMT has to define how to proceed when multiple fields are found.

Accessing Arrays

Arrays can be accessed in different ways and at different levels.

Accessing the whole array: if a path points to an array and the SMT supports it as input, then a.b can be used where b is an array.
Accessing all elements of the array: if a path points to an array, and its elements are not objects, e.g. string. then the SMT can access all the elements of the array at once using a.b where b is an array.
Accessing child elements on all array objects: if a path access an array and its elements are objects, we can access all the objects by providing a path of its child elements, e.g. a.b.c access array b and element c in all the items of the array.
Accessing a single item by index: if a path points to an array and then uses an index, then it gets that specific element. if no additional child element is provided, then it accesses the whole object/element.e.g. a.b.1 accesses the second item of the array.
Accessing elements within a single item by index. If the item of the array is an object, we can access its elements, e.g. a.b.1.c to access the second item of the array, and access the field c

//TODO add examples to SMTs

Public Interfaces

From the existing list of SMTs, there are the following to be impacted by this change:

New configuration flags

...

Permitted values: v1 , v2 . Defines the version of the syntax to access fields. If set to "v1", then the field paths are limited to access the elements at the root level of the struct or map. If set to "v2", the syntax will support accessing nested elements. o access nested elements, dotted notation is used. If dots are already included in the field name, then dots themselves can be used to represent dots part of the field name. e.g. to access elements from a struct/map named "same.field", the following format can be used to access its elements: "same..field.element".

This configuration will affect all the field paths used by the transform.

scenario data path fields

Accessing nested elements

Code Block
{ "k1": { "b": "b1" }, "k2": { "b": "b2" }, "k3": { "b": "b3" } }

*.b

k1.b
k2.b
k3.b

Accessing nested objects and their elements

Code Block
{ "k1": { "b": { "c": "c1" } }, "k2": { "b": { "c": "c2" } }, "k3": { "b": { "c": "c3" } } }

*.b.c

k1.b.c
k2.b.c
k3.b.c

Starting at an element

Code Block
{ "a": { "k1": { "b": { "c": "c1" } }, "k2": { "b": { "c": "c2" } }, "k3": { "b": { "c": "c3" } } }, "a2": {}}

a.*.b

Not allowed to finish with asterisk

{ "a": {

"k1": { "b": { "c": "c1" } },

"k2": { "b": { "c": "c2" } },

"k3": { "b": { "c": "c3" } }

}, "a2": {}}

a.*

Not allowed

Accessing Arrays

Arrays can be accessed in different ways and at different levels.

Accessing the whole array: if a path points to an array and the SMT supports it as input, then a.b can be used where b is an array.
Accessing all elements of the array: if a path points to an array, and its elements are not objects, e.g. string. then the SMT can access all the elements of the array at once using a.b where b is an array.
Accessing child elements on all array objects: if a path access an array and its elements are objects, we can access all the objects by providing a path of its child elements, e.g. a.b.c access array b and element c in all the items of the array.
Accessing a single item by index: if a path points to an array and then uses an index, then it gets that specific element. if no additional child element is provided, then it accesses the whole object/element.e.g. a.b.1 accesses the second item of the array.
Accessing elements within a single item by index. If the item of the array is an object, we can access its elements, e.g. a.b.1.c to access the second item of the array, and access the field c

scenario

data

path

fields

Accessing struct and root elements

Code Block
{ "a": [ "a1", "a2", "a3"]

a

a
a.0
a.1
a.2

Accessing an item by index

Code Block
{ "a": [ "a1", "a2", "a3"]

a.<index>

a.0

a.0

Accessing elements within objects

Code Block
{ "a": [ { "b": "b1" }, { "b": "b2" } ]

a.b

a.0.b
a.1.b

Accessing an item by index, and its elements within an object

Code Block
{ "a": [ { "b": "b1" }, { "b": "b2" } ]

a.0.b

a.0.b

Public Interfaces

From the existing list of SMTs, there are the following to be impacted by this change:

New configuration flags

Name Type Default Importance Documentation

field.syntax.version

STRING

v1

HIGH

Permitted values: v1 , v2 . Defines the version of the syntax to access fields. If set to "v1", then the field paths are limited to access the elements at the root level of the struct or map. If set to "v2", the syntax will support accessing nested elements. o access nested elements, dotted notation is used. If dots are already included in the field name, then dots themselves can be used to represent dots part of the field name. e.g. to access elements from a struct/map named "same.field", the following format can be used to access its elements: "same..field.element".

This configuration will affect all the field paths used by the transform.

These flags will be added conditionally to some SMTs, as described below.

Affected SMTs

Cast

Changes:

Extend spec to support nested notation.
Supports arrays and deep-scan to access multiple fields.
- If the paths returned do not match a supported type to be converted by spec, then ignores.

Examples:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.spec": "k1:string,parent.child.k2:int64"
}

Code Block

language	js

{
  "k1": "123",
  "parent": {
    "child": {
      "k2": 123    
    }
  }
}

2. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent.child": {
    "k2": "123"
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.spec": "k1:string,parent..child.k2:int64"
}

Code Block

language	js

{
  "k1": "123",
  "parent.child": {
    "k2": 123
  }
}

3. Multiple paths found

Code Block
{ "k1": 123, "parent1": { "child": { "k2": "123" } }, "parent2": { "child": { "k2": "123" } } }

Code Block
{ "transforms": "smt1", "transforms.smt1.type": "org.apache.kafka.connect.transforms.Cast$Value", "transforms.smt1.field.syntax.version": "v2", "transforms.smt1.spec": "k1:string,*.child.k2:int64" }

Code Block
{ "k1": "123", "parent1": { "child": { "k2": 123 } }, "parent2": { "child": { "k2": 123 } } }

4. Multiple paths found, but some types do not match and are ignored

Code Block
{ "k1": 123, "parent1": { "child": { "k2": "123" } }, "parent2": { "child": { "k2": {} } } }

Code Block
{ "transforms": "smt1", "transforms.smt1.type": "org.apache.kafka.connect.transforms.Cast$Value", "transforms.smt1.field.syntax.version": "v2", "transforms.smt1.spec": "k1:string,*.child.k2:int64" }

Code Block
{ "k1": "123", "parent1": { "child": { "k2": 123 } }, "parent2": { "child": { "k2": {} } } }

ExtractField

Changes:

Extend field to support nested notation.
Supports arrays and deep-scan to access multiple fields.
- If multiple paths are found, then it creates an array.

Example:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "parent.child.k2"
}

Code Block

language	js

"123"

2. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent.child": {
    "k2": "123"
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "parent..child.k2"
}

Code Block

language	js

"123"

3. Nested field, an object returned.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "parent.child"
}

Code Block

language	js

{ "k2": "123" }

3. Nested field, an array returned.

Code Block

language	js

{
  "k1": 123,
  "parent1": {
    "child": {
      "k2": "123"    
    }
  },
  "parent2": {
    "child": {
      "k2": "234"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "*.child.k2"
}

Code Block

language	js

[ "123", "234" ]

HeaderFrom

Changes:

Extend fields to support nested notation.
As this SMT affects only existing fields, additional configurations will not be required.
Does not support multiple values (e.g. deep scan or array), if multiple paths are found, only the first one is used.

Example:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.HeaderFrom$Value",
"transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields": "k1,parent.child.k2",
"transforms.smt1.headers": "k1,k2"
}

Code Block

language	js

headers:
- k1=123
- k2="123"

2. Nested field, when field names include dots

These flags will be added conditionally to some SMTs, as described below.

Affected SMTs

Cast

Changes:

Extend spec to support nested notation.
Supports arrays and deep-scan to access multiple fields.

Examples:

scenarioinputsmtoutput1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent

": { "

.child": {

    "k2": "123"

}

}
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

Cast$Value

HeaderFrom$Value",

 "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields":

"v2

"k1,parent..child.k2",
"transforms.smt1.

spec

headers": "k1

:string

parent.child.

k2

:int64

"
}

Code Block

language	js

headers:
- k1=123
- k2="123"

3. Nested field, an array returned.

Code Block

language	js

{
  "k1":

"

,
  "

parent

parent1": {
    "child": {
      "k2": "123"    
    }

}

2. Nested field

when field names include dots Code Block

language	js

{

k1

parent2":

123,

{
    "

parent.

child": {
      "k2":

"123"

 "234"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

Cast$Value

ExtractField$Value",
"transforms.smt1.

field.syntax.version

fields": "

v2

k1,*.child.k2",
"transforms.smt1.

spec

headers": "k1

:string

parent..child.

k2

:int64

"
}

Code Block

language	js

{

headers:

"

k1

": "123", "parent.child": { "k2": 123 } }

...

=123
- k2="123"

MaskField

Changes:

Extend field fields to support nested notation.Does not support multiple values (e.g. deep scan or array
)Supports arrays and deep-scan to access multiple fields.

Example:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

ExtractField$Value

MaskField$Value",

 "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

field

fields": "parent.child.k2"
}

Code Block

language	js

"123"

2. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent

.child

": {

"k2": "123"

} } Code Block

language	js

{ "transforms":

"smt1", "transforms.smt1.type": "org.apache.kafka.connect.transforms.ExtractField$Value", "transforms.smt1.field.syntax.version": "v2", "transforms.smt1.field": "parent..child.k2" }

Code Block

language	js

"123"

HeaderFrom

Changes:

Extend fields to support nested notation.
As this SMT affects only existing fields, additional configurations will not be required.
Does not support multiple values (e.g. deep scan or array)

Example:

  "child": {
      "k2": ""    
    }
  }
}

2. Nested field, when field names include dots

scenarioinputsmtoutput1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent

": { "

.child": {

    "k2": "123"

}

}
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

HeaderFrom$Value

MaskField$Value",

 "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields": "

k1,

parent.

child.k2", "transforms.smt1.headers": "k1,

.child.k2"
}

Code Block

language	js

headers:

-

"k1

=123 - k2="123"

": 123,
  "parent.child": {
    "k2": ""
  }
}

1. Nested field.

2. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent1": {
    "child": {
      "k2": "123"    
    }
  },
  "parent2": {
    "

parent.

child": {
      "k2

": "123"

": "234"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

HeaderFrom$Value

MaskField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields": "*.child.k2"
}

Code Block

language	js

{
  "k1

,parent..child.k2", "transforms.smt1.headers": "k1,k2" }

Code Block

language	js

headers:
- k1=123
- k2="123"

MaskField

Changes:

": 123,
  "parent1": {
    "child": {
      "k2": ""    
    }
  },
  "parent2": {
    "child": {
      "k2": ""    
    }
  }
}

ReplaceField

Changes:

Extend theinclude and exclude listsExtend fields to support nested notation.
Supports arrays and deep-scan to access multiple fields.

Example:

scenario

input

smt

output

1. Nested field. Drop field

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

MaskField$Value

ReplaceField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

fields

exclude": "parent.child.k2"
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child

": { "k2": ""

": {
    }
  }
}

2. Nested field

, when field names include dots

. Drop struct

Code Block

language	js

{
  "k1": 123,
  "parent

.

": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.

MaskField$Value

ReplaceField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

fields

exclude": "parent

.

.child

.k2

"
}

Code Block

language	js

{
  "k1": 123,
  "parent

.child

": {

"k2": ""

Example:

scenarioinputsmtoutput1

}
}

ReplaceField

Changes:

...

3. Nested field.

Drop

Include field

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123",
      "k3": "234"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

exclude

include": "parent.child.k2"
}

Code Block

language	js

{
  "

k1

parent":

123,

{
    "

parent

child": {
      "

child

k2": "123"

{


    }
  }
}

2

4. Nested field.

Drop

Include struct

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123",
      "k3": "234"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.smt1.field.syntax.version

": "v2", "transforms.smt1.exclude": "parent.child" } Code Block

language	js

{ "k1

":

123

"v2",

"transforms.smt1.include": "parent

": { } }3. Nested field. Include field

.child"
}

Code Block

language	js

"k1": 123,


  "parent": {
    "child": {
      "k2": "123",
      "k3": "234"    
    }
  }
}

5. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent.child": {

}

"k2": "123"
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

include

renames": "parent..child.k2:field2"
}

Code Block

language	js

parent

k1":

{

123,
  "parent.child": {

k2

field2": "123"

}

4. Nested field. Include struct

6. Multiple fields

Code Block

language	js

{
  "k1": 123,
  "

parent

parent1": {
    "child": {
      "k2": "123"    
    }
  },
  "parent2": {
    "child": {
      "

k3

k2": "234"

  
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ReplaceField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.

include

renames": "

parent

*.child.k2:field2"
}

Code Block

language	js

{
  "k1": 123,
  "

parent

parent1": {
    "child": {
      "

k2

field2": "123"    
    }
  },
  "parent2": {
    "child": {
      "

k3

field2": "234"

} } }

}
}

TimestampConverter

Changes:

Extend fields to support nested notation.
Supports arrays and deep-scan to access multiple fields.

Example:

scenario	input	smt	output
1. Nested field.

5. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent

.

": {
    "child": {
      "k2":

"123"

 1556204536000         }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms

.ReplaceField$Value

.smt1.field": "parent.child.k2",


"transforms.smt1.

field.syntax.version

format": "

v2

yyyy-MM-dd",
"transforms.smt1.target.

renames

type": "

parent..child.k2:field2

string"
}

Code Block

language	js

  "k1": 123,

  "parent

.

": {
    "child": {

field2

k2": "

123" } }

TimestampConverter

Changes:

Extend fields to support nested notation.
Does not support multiple values (e.g. deep scan or array)

Example:

2014-04-25"         }
  }
}

2. Nested field, when field names include dots

scenarioinputsmtoutput1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent

": { "

.child": {
      "k2": 1556204536000         }
  }
}

Code Block

language	js




{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.field": "parent..child.k2",
"transforms.smt1.format": "yyyy-MM-dd",
"transforms.smt1.target

.type": "string" } Code Block

language	js

{ "k1": 123, "parent": { "child": { "k2

.type": "

2014-04-25" } } }2. Nested field, when field names include dots

string"
}

Code Block

language	js

{
  "k1": 123,
  "parent.child": {

   "k2":

1556204536000 } } }

"2014-04-25"   }
}

ValueToKey

Changes:

Extend fields to support nested notation.
Supports arrays and deep-scan to access multiple fields.
- If multiple paths are found, then it creates an array.

Example:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

"transforms":

"smt1", "transforms.smt1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.smt1.field.syntax.version": "v2", "transforms.smt1.field": "parent..child.k2", "transforms.smt1.format": "yyyy-MM-dd", "transforms.smt1.target.type": "string" }

 "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

k1

transforms":

123

"smt1",

parent

transforms.smt1.

child

type":

{ "k2

"org.apache.kafka.connect.transforms.ValueToKey", "transforms.smt1.field.syntax.version": "

2014-04-25" } }

ValueToKey

Changes:

Extend fields to support nested notation.
Does not support multiple values (e.g. deep scan or array)

Example:

v2",
"transforms.smt1.fields": "parent.child.k2"
}

Code Block
"123"

2. Nested struct to Key

scenarioinputsmtoutput1. Nested field

.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields": "parent.child

.k2

"
}

Code Block
{ "k2": "123"

3. Nested field, when field names include dots

2. Nested struct to Key.

Code Block

language	js

{
  "k1": 123,
  "parent

": { "

.child": {

    "k2": "123"

}

}
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.smt1.field.syntax.version": "v2",

"transforms.smt1.field.syntax.version": "v2", "transforms.smt1.fields": "parent.child" } Code Block{


"transforms.smt1.fields": "parent..child.k2"
}

Code Block

language	js

"123"

4. Multiple values to key

Code Block

language	js

{
  "k1": 123,
  "parent1": {
    "child": {
      "k2": "123"    
    }
  }

3. Nested field

when field names include dots Code Block

language	js

{

k1

parent2":

123,

{
    "

parent.

child": {
      "k2": "

123"

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.fields": "

parent.

*.child.k2"
}

Code Block

languagejs


[ "123", "234 ]

InsertField

Changes:

Extend *.field to support nested notation.
Does not support multiple values (e.g. deep scan or array)

...

Name	Type	Default	Importance	Documentation
`field.on.missing.parent`	STRING	create	`MEDIUM`	Permitted values: `create`, `ignore`. Defines how to react when the field to act on does not have a parent and "field.style" is "nested". If set to "create", then the SMT will create the parent struct/map when it does not exist. If set to "ignore", then it will SMT have no effect.
`field.on.existing.field`	STRING	overwrite	`MEDIUM`	Permitted values: `overwrite`, `ignore`. Defines how to react when the field to act on already exists. If set to "overwrite", then the SMT will be applied to the existing field. If set to "ignore", then it will SMT have no effect.

Example:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent.child.k3"
"transforms.smt1.static.value": "v3" 
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123",
      "k3": "v3"   
    }
  }
}

2. Nested field, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent.child": {
    "k2": "123"
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value",  "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent..child.k3"
"transforms.smt1.static.value": "v3" 
}

Code Block

language	js

{
  "k1": 123,
  "parent.child": {
    "k2": "123",
    "k3": "v3"
  }
}

3. Nested field with the parent missing

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent.other.k3"
"transforms.smt1.static.value": "v3" 
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"  
    },
    "other": {
      "k3": "v3"  
    }
  }
}

4. Nested field with the parent missing, and ignore is set

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent.other.k3"
"transforms.smt1.static.value": "v3",
"transforms.smt1.field.on.missing.parent": "ignore"
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"  
    }
  }
}

5. Nested field with the parent missing

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value",  "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent.child.k2"
"transforms.smt1.static.value": "456"
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "456"  
    }
  }
}

6. Nested field with the parent missing, and ignore is set

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.InsertField$Value",  "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.static.field": "parent.child.k2"
"transforms.smt1.static.value": "456",
"transforms.smt1.field.on.existing.field": "ignore"
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"  
    }
  }
}

HoistField

Changes:

Add a hoisted config to point to a specific path to hoist.
Does not support multiple values (e.g. deep scan or array)

...

Name	Type	Default	Importance	Documentation
`hoisted`	`STRING`	<empty>	`MEDIUM`	Path to the element to be hoisted. If empty, the root struct/map is hoisted.

Examples:

scenario

input

smt

output

1. Nested field.

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "k2": "123"    
    }
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.HoistFIeld$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.hoisted": "parent.child.k2",
"transforms.smt1.field": "other"
}

Code Block

language	js

{
  "k1": 123,
  "parent": {
    "child": {
      "other": {
        "k2": "123"
      }    
    }
  }
}

2. Nested struct, when field names include dots

Code Block

language	js

{
  "k1": 123,
  "parent.child": {
    "k2": "123"
  }
}

Code Block

language	js

{
"transforms": "smt1",
"transforms.smt1.type": "org.apache.kafka.connect.transforms.HoistFIeld$Value", "transforms.smt1.field.syntax.version": "v2",
"transforms.smt1.hoisted": "parent..child",
"transforms.smt1.field": "other"
}

Code Block

language	js

{
  "k1": 123,
  "other": {
    "parent.child": {
      "k2": "123"
    }
  }
}

Non-affected SMTs

These SMT do not require nested structure support:

...

Using double dots to escape separators is another alternative to try sticking to using only dots as a field separator.

Comparing:

With separator

With double dots

With separator

Code Block
{ "transforms": "cast", "transforms.cast.field.syntax.version": "v2", "transforms.cast.type": "..." "transforms.cast.spec": "address..personal.country:string" }

With double dots

Code Block

{   
   "transforms": "cast",
  "transforms.cast.field.syntax.version": "v2",
         "transforms.cast.field.separator": "/", 
   "transforms.cast.type": "..."
  "transforms.cast.spec": "address..personal./country:string"
}

Code Block
{ "transforms": "cast", "transforms.cast.field.syntax.version": "v2", "transforms.cast.field.separator": "/", "transforms.cast.type": "..." "transforms.cast.spec": "address.personal/country:string", }

Even if using custom separators represent a more explicit configuration, there is always the possibility that all the separators are already included as part of the field name, leading to issues and request for changes.

To avoid this, this KIP proposes using the approach to precede dots with another to escape itself.

Use JSONPath notation to access nested elements

//TODO

Use named styles instead of syntax versions

//TODO

Potential KIPs

Future KIPs could extend this support for:

...

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-10640

...

,
}

Even if using custom separators represent a more explicit configuration, there is always the possibility that all the separators are already included as part of the field name, leading to issues and request for changes.

To avoid this, this KIP proposes using the approach to precede dots with another to escape itself.

Use JSONPath notation to access nested elements

JSONPath[1] was a proposed alternative to the nested notation. A drafted version of the KIP with examples using the proposed notation is outlined here: [DRAFT] KIP-821: Connect Transforms support for nested structures (JsonPath-based draft)

The following limitations were found:

The JSONPath spec is too extensive for the use-cases included in this KIP.
A sub-set of JSONPath was proposed, but the custom spec ends up being more complex than the notation proposed here.
- A sub-set will imply not using existing dependencies. Though adding an existing dependency would also reduce the chance of the KIP being accepted as the risk for external vulnerabilities will increase.
- The sub-set will require users to learn JSONPath, and then what's covered and what's not by the custom implementation.

Given these cons, the KIP is preferring the dotted notation.

[1] https://github.com/json-path/JsonPath

Use named styles instead of syntax versions

Was considered to use a configuration to name the styles to target fields:

field.style with valid values: "plain", "nested".

Even though this configuration is self-describing, it limits the semantics of the values.

Instead, the KIP is considering a versioned configuration to avoid affecting current behavior and make it easier to extend by including compatible changes on the same version.

Space shortcuts

Child pages

Versions Compared

Old Version 18

New Version 19

Key

Status

Accessing multiple values

by deep-scan

Accessing Arrays

Public Interfaces

New configuration flags

Accessing Arrays

Public Interfaces

New configuration flags

Affected SMTs

Cast

ExtractField

HeaderFrom

Affected SMTs

Cast

MaskField

HeaderFrom

MaskField

ReplaceField

ReplaceField

TimestampConverter

TimestampConverter

ValueToKey

ValueToKey

InsertField

HoistField

Non-affected SMTs

Use JSONPath notation to access nested elements

Use named styles instead of syntax versions

Potential KIPs

Use JSONPath notation to access nested elements

Use named styles instead of syntax versions

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 18

New Version 19

Key

Status

Accessing multiple values

by deep-scan

Accessing Arrays

Public Interfaces

New configuration flags

Accessing Arrays

Public Interfaces

New configuration flags

Affected SMTs

Cast

ExtractField

HeaderFrom

Affected SMTs

Cast

MaskField

HeaderFrom

MaskField

ReplaceField

ReplaceField

TimestampConverter

TimestampConverter

ValueToKey

ValueToKey

InsertField

HoistField

Non-affected SMTs

Use JSONPath notation to access nested elements

Use named styles instead of syntax versions

Potential KIPs

Use JSONPath notation to access nested elements

Use named styles instead of syntax versions