...
If the result of any Transformation.apply()
in a chain is null
, that record is discarded (not written to Kafka in the case of a source connector, or not provided to sink connector).
Bundled transformations
Criteria: SMTs that are shipped with Kafka Connect should be general enough to apply to many data sources & serialization formats. They should also be simple enough to not cause any additional library dependency to be introduced.
Beyond those being initially included with this KIP, transformations can be adopted for inclusion in future with JIRA/ML discussion to weigh the tradeoffsWIP. Subject to discussion – not a final list.
Name | Functionality | Rationale | Configuration |
---|---|---|---|
Mask{Key,Value} | Mask or replace the specified primitive fields, assuming there is a top-level Struct . | Obscure sensitive info like credit card numbers. |
|
InsertIn{Key,Value} | Insert specified fields with given name, assuming there is a top-level Struct . | Widely applicable to insert certain record metadata. |
|
TimestampRouter | Timestamp-based routing. | Useful for temporal data e.g. application log data being indexed to a search system with a sink connector can be routed to a daily index. |
|
RegexRouter | Regex-based routing. | There are too many inconsistent configs to route in different connectors. |
See http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceFirst(java.lang.String) |
ValueToKey | Create or replace record key with data from record value. | Useful when a source connector does not populate the record key but only the value with a |
|
Flatten | Flatten nested | Useful for sink connectors that can only deal with flat Struct s. |
|
Replace | Filter and rename fields. | Useful for lightweight data munging. |
|
NumericCasts | Casting of numeric field to some specified numeric type. | Useful in conjunction with source connectors that don't have enough information and utilize an unnecessarily wide data type. |
|
TimestampConverter | Convert datatype of a timestamp field. | Timestamps are represented in a ton of different ways, provide a transformation from going between strings, epoch times as longs, and Connect date/time types. |
|
Hoist{Key,Value}ToStruct | Wrap data in a | Useful when a transformation or sink connector expects Struct but the data is a primitive type. |
|
Extract{Key,Value}FromStruct | Extract a specific field from a Struct . | The inverse of Hoist{Key,Value}ToStruct |
|
...
However, the surface area for such a change is much larger - we would need additional REST APIs for creating, updating and validating transformation chain configs. The current proposal does not prevent taking this direction down the line.
Not including any transformations with Connect
In the interest of providing a better out-of-the-box experience and avoiding duplication of effort in the ecosystem, we will be bundling certain transformations with Connect.
One concern here is that we should have a well-defined criteria for what belongs in Connect vs external dependencies, which was addressed.