Table of Contents |
---|
Status
Current state: WIP Under Discussion
Discussion thread: here
JIRA: KAFKA-3209
...
Name | Functionality | Rationale | Configuration |
---|---|---|---|
Mask{Key,Value} | Mask or replace the specified primitive fields, assuming there is a top-level Struct . | Obscure sensitive info like credit card numbers. |
|
InsertIn{Key,Value} | Insert specified fields with given name, assuming there is a top-level Struct . | Widely applicable to insert certain record metadata. |
|
TimestampRouter | Timestamp-based routing. | Useful for temporal data e.g. application log data being indexed to a search system with a sink connector can be routed to a daily index. |
|
RegexRouter | Regex-based routing. | There are too many inconsistent configs to route in different connectors. |
See http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#replaceFirst(java.lang.String) |
ValueToKey | Create or replace record key with data from record value. | Useful when a source connector does not populate the record key but only the value with a |
|
Flatten | Flatten nested | Useful for sink connectors that can only deal with flat Struct s. |
TODO: specify escaping |
Replace | Filter and rename fields. | Useful for lightweight data munging. |
|
NumericCasts | Casting of numeric field to some specified numeric type. | Useful in conjunction with source connectors that don't have enough information and utilize an unnecessarily wide data type. |
|
TimestampConverter | Convert datatype of a timestamp field. | Timestamps are represented in a ton of different ways, provide a transformation from going between strings, epoch times as longs, and Connect date/time types. |
|
Hoist{Key,Value}ToStruct | Wrap data in a |
| |
Extract{Key,Value}FromStruct | Extract a specific field from a Struct . |
|
...
Data transformations could be applicable to the key or the value of the record. We will have *
Key
and *Value
variants for these transformations that reuse the common functionality from a shared base class.Some common utilities for data transformations will shape up:
Cache the changes they make to
Schema
objects, possibly only preserving last-seen one as the likelihood of source dataSchema
changing is low.Copying of
Schema
objects with the possible exclusion of some fields, which they are modifying. Likewise, copying ofStruct
object to anotherStruct
having a differentSchema
with the exception of some fields, which they are modifying.Where fields are being added and a field name specified in configuration, we will want a consistent way to convey if it should be created as an optional field. We can use a leading '?' character. TODO: specify escaping
ConfigDef
does not provide aType.MAP
, but for the time being we can piggyback on top ofType.LIST
and represent maps as a list of key-value pairs separated by:
TODO: specify escaping.
- Where field names are expected, in some cases we should allow for getting at nested fields by allowing a dotted syntax which is common in such usage (and accordingly, will need some utilities around accessing a field that may be nested). TODO: specify escaping
- There are escaping considerations to several such configs, so we will need utilities that that assume a consistent escaping style (e.g. backslashes).
Compatibility, Deprecation, and Migration Plan
...