...
'Distinct' operation is common in data processing, e. g.
- SQL
DISTINCT
keyword, java.util.stream.Stream
hasdistinct()
method,- .NET LINQ
Distinct
method,SQL has DISTINCT keyword, Apache Spark's
distinct()
Code Block language java firstline 1 KTable<Windowed<K>, V> distinct(final Named named); KTable<Windowed<K>, V> distinct(final Materialized<K, V, WindowStore<Bytes, byte[]>> materialized); KTable<Windowed<K>, V> distinct(final Named named, final Materialized<K, V, WindowStore<Bytes, byte[]>> materialized);
,
- Hazelcast Jet's
distinct()
, etc. etc.
Hence it is natural to expect the similar functionality from Kafka Streams.
...
distinct()
parameterless DSLOperation on- TimeWindowedKStream<K,V> DSLObject which returns KStream<Windowed<K>,V>
- SessionWindowedKStream<K,V> DSLObject which returns KStream<Windowed<K>,V>
The following methods are added to the corresponding interfaces:
The distinct operation returns only a first record that falls into a new window, and filters out all the other records that fall into an already existing window.
The records are considered to be duplicates iff serialized forms of their keys are equal.
Usage Examples
Consider the following example (record times are in seconds):
...