...
'Distinct' operation is common in data processing, e. g.
- SQL
DISTINCT
keyword, - In standard libraries for programming languages
- .NET LINQ
Distinct
method, - Java
- Stream
distinct()
- ,
Distinct
method,- Scala Seq
distinct()
,
- .NET LINQ
- In data processing frameworks:
Apache Spark's
distinct()
,- Apache Flink's
distinct()
, - Apache Beam's
Distinct()
, - Hazelcast Jet's
distinct()
, etc.
Hence it is natural to expect the similar functionality from Kafka Streams.
Although Kafka Streams Tutorials contains an example of how distinct
can be emulated, but this example is complicated: it involves low-level coding with local state store and a custom transformer. It might be much more convenient to have distinct
as a first-class DSL operation.
Due to 'infinite' nature of KStream, distinct
operation should be windowed, similar to windowed joins and aggregations for KStreams.
...