This page summarizes our past feature proposals and discussions in Kafka Streams. Promoted ideas will be proposed as KIPs.
Public API Improvements
Currently, the public API of Kafka Streams is not perfect. This is a summary of knows issues, and we want to collect user feedback to improve the API.
Issue | User Impact / Importance | Possible Solution | Solution User Impact |
---|---|---|---|
TopologyBuilder and KStreamBuilder
| Might be hard for users to understand concept. User might be confused by verbose API (and leaking methods) they should never see. Importance: high | KIP-120 | medium
|
Too many overloads for method of KStreamBuilder, KStream, KGroupedStream, KTable, and KGroupedTable | Many methods have more than 6 overloads and it's hard for the users to understand which one to use. Furthermore, with the verbose generics, compiler errors might be confusing and not helpful if a parameter is specified wrong (ie, I want to use overlaod X, does the compiler pick the correct overlaod? and if yes, which parameter did I get wrong? and if no, which parameter do I need to change to the compiler picks the correct overload?) As we add more feature, this is getting more severe. Importance: high | Change to Builder Pattern | high
|
Non consistent overloads | Some API have non-consistent overloaded methods that might be confusing to the user (why do I need to specify this for overload A, but not for overload B? – why does overload X allow me to do this, but not overload Y) Importance: medium | Relates to "Too many overlaods" – could be resolved with a clean builder abstraction. | medium
|
DSL limits access to records and/or record metadata | Some interface like Record metadata (like offset, timestamp, partition, topic) is not accessible in DSL interfaces. | Change interfaces, RichFunctions, Use process/transform | low
|
Missing public API | Some very helpful classes, that are currently in package Importance: low | Move classes to different package. | low
|
Window(s) API |
Importance: low | low | |
Improve StreamsConfig API | API is verbose and with intermixed consumer and producer configs hard to use correctly. Importance: low | Builder pattern | medium
|
ProcessorContext to verbose | ProcessorContext give access to method that cannot be called. This is hard to reason about for users. Importance: low | Split ProcessorContext and extract RecordContext | low
|
low-level API integration into DSL | Currently, low-level API is integrated into DSL via process()/transform() and transformValues(). Those abstraction are not perfectly defined and confusing to users. Importance: medium | Major redesign | medium
|
Low-level API in DSL vs. "advanced DSL" | Currently, low-level API is used to empower the user to do anything within DSL. This approach is questionable to some extends. For example, if a user wants to do a stateful 1:1 transformation of records, she must implement Importance: medium | Major redesign | medium
|
Many of the above issues are related to each other and/or overlap. This, also reflects in a bunch of JIRAs that are all related to API changes:
- https://issues.apache.org/jira/browse/KAFKA-4125 (Rich Functions)
- https://issues.apache.org/jira/browse/KAFKA-3455 (valid?)
- https://issues.apache.org/jira/browse/KAFKA-4713 (ProcessorContext.init)
- https://issues.apache.org/jira/browse/KAFKA-4218 (add key to ValueTransformer – ie. mapValues and transformValues)
- https://issues.apache.org/jira/browse/KAFKA-4217 (add flatTransform() and flatTransformValues() – seem invalid to me)
- https://issues.apache.org/jira/browse/KAFKA-4346 (add foreachValue to KStream)
- https://issues.apache.org/jira/browse/KAFKA-3745 (add key to ValueJoiner)
- https://issues.apache.org/jira/browse/KAFKA-4726 (add key to ValueMapper)
Thus, to tackle this issue, it seems to be a good idea to break it down into groups of issues, and do a KIP per group to get a overall sound design.