Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Status

Current stateUnder DiscussionVote

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

...

Currently there exists a MaskField SMT but that would completely remove the value by setting it to an equivalent null value. One problem with this would be that you’d not be able to know in the case of say a password going through the mask transform it would become "" which could mean that no password was present in the message, or it was removed. However this hash transformer would remove this ambiguity if that makes sense. The proposed hash functions would be MD5, SHA1, SHA256. which are all supported via MessageDigest.

Public Interfaces

One new class connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Hash.java and a helper class connect/transforms/src/main/java/org/apache/kafka/connect/transforms/util/Hex.java are proposed additions. No modifications required to existing interfaces.

...

transforms=HashEmail
transforms.HashEmail.type=org.apache.kafka.connect.transforms.Hash$Value
transforms.HashEmail.field.name=email
transforms.HashEmail.function=sha1

Based on feedback from Gunnar Morling (https://debezium.io/documentation/reference/connectors/mysql#mysql-property-column-mask-hash) I think that this should also support
1) an optional salt, which would be set via transforms.HashEmail.salt
2) a comma separated list of fields where a period is used to denote nested fields
Given these suggestions

transforms=HashFields
transforms.HashFields.type=org.apache.kafka.connect.transforms.Hash$Value
transforms.HashFields.field.name=user.email,user.ssn,contact
transforms.HashFields.function=sha1
transforms.HashFields.salt=F4xJK03Ab

Compatibility, Deprecation, and Migration Plan

...