Table of Contents |
---|
Status
Current state: Under Discussion
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
The TimeStampConverter
TimestampConverter
transform only allows to convert one field at a time for each usage of the transform (by use of the field
configuration parameter). But in a real environment you will often have multiple timestamps on an event (such as Created On, Last Updated On, Approved On, etc), and if you are in a position that one of them need to be converted using TimeStampConverter
TimestampConverter
then probably more than one (if not all of them) need to be transformed. For large messages which may already be going through multiple other transforms, then the performance goes down quite a bit if you end up chaining more than just a few TimeStampConverter
TimestampConverter
transforms just to catch all of the different fields.
At the same time, in the case of parsing strings to timestamps, in "real" environments it is not always possible to strictly control timestamp formats if multiple different services are producing messages to the same topic. For example, maybe some have specified a time zone and some have not, some give milliseconds, and some do not, etc. All of these variations could even be "valid" within the ISO 8601 standard but even the slightest difference in format of any event that does not match the exact specified format
pattern will produce a failure with TimeStampConverter
TimestampConverter
. So it would be better if it was possible to give an input pattern that allowed for different variations to be parsed from string into a proper Date/Time type.
Public Interfaces
From the perspective of using this transform in Connect, the following things will be changed:
- Change the configuration parameter
field
to be calledfields
since it will now support multiple comma-separated field names (but can support backward compatibility for some time). - Add new configuration parameters
format.input
to allow for a pattern format which supports multiple variations to parse a string, andformat.output
to specify the exact string format to output in the case of converting from a Date/Time to a string. - The configuration parameter
format
could be removed (but remain for backwards compatibility), or could also be used to specify bothformat.input
andformat.output
at the same time (assuming you just have a single string input format).
Proposed Changes
Supporting Multiple Fields
For supporting multiple fields, we can create a new configuration parameter called fields
which is of type ConfigDef.Type.LIST
.
...
Code Block | ||||
---|---|---|---|---|
| ||||
for (Map.Entry<String, Object> field : value.entrySet()) { if (config.fields.contains(field.getKey())) updatedValue.put(field.getKey(), convertTimestamp(field.getValue())); } |
Supporting Multiple String Input Formats
For output of a Date/Time field to a string, then it must be given in an exact format. So because of this, we need to separate the format configuration parameter into two: one parameter for output to strings with an exact format, and one parameter for input format of strings to be parsed into the target.type
that can support a pattern of different variations of the string-based date or timestamps.
...
Code Block | ||||
---|---|---|---|---|
| ||||
TemporalAccessor temporalAccessor = config.inputFormat.parseBest((String) orig, ZonedDateTime::from, LocalDate::from); if (temporalAccessor instanceof ZonedDateTime) return Date.from(((ZonedDateTime) temporalAccessor).toInstant()); else if (temporalAccessor instanceof LocalDate) return Date.from(((LocalDate) temporalAccessor).atStartOfDay(ZoneOffset.UTC).toInstant()); |
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
...
No migration tool should be necessary, users will just need to update their config files or send a PUT request to the Connect REST API to update the configuration of connectors which are using the TimestampConverter
transform.
- When will we remove the existing behavior?
Assuming that it will be based on the standard: "2 versions later".
Rejected Alternatives
One initial thought was to change the entire transform from using java.util.Date
to instead use java.time
classes instead. However, after a bit of investigation I quickly found that since Kafka and Connect have a huge list of dependencies on dates and times being a java.util.Date
, then it quickly became apparent that the easiest thing to do would be to focus on the core problem: parsing strings into a Date
in a smarter way with the help of something like DateTimeFormatter
. and then continue returning a Date
for use by the rest of Connect.
...