Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

'Distinct' operation is common in data processing, e. g.

  • Java java.util.stream.Stream has distinct() operation method,
  • SQL has DISTINCT keyword.

...

  • distinct DSLOperation on a KStream<K, V> DSLObject which returns another KStream<K, V> DSLObject,

  • DistinctParameters DSLParameter.

Using DistinctParameters user provides the following:

  1. KeyValueMapper<K, V, VR> idExtractor — extracts a unique identifier from a record by which we de-duplicate input records. If it returns null, the record will not be considered for de-duping but forwarded as-is. If not provided, defaults to (key, value) -> KeyValue.pair(key, value), which means deduplication based on both key and value of the record.
  2. TimeWindows timeWindows — tumbling or hopping time-based window specification. Required parameter. Only the first message with a given id that falls into a window will be passed downstream.
  3. boolean isPersistent — whether the WindowStore that stores the duplicates should be persistent or not.

...