This page summarizes our past feature proposals and discussions in Kafka Streams. Promoted ideas will be proposed as KIPs.
Table of Contents |
---|
Public API Improvements
Currently, the public API of Kafka Streams is not perfect. This is a summary of knows issues, and we want to collect user feedback to improve the API.
Issue | User Impact / Importance | Possible Solution | Solution User Impact |
---|---|---|---|
TopologyBuilder and KStreamBuilder
| Might be hard for users to understand concept. User might be confused by verbose API (and leaking methods) they should never see. Importance: high | KIP-120 | medium
|
Too many overloads for method of KStreamBuilder, KStream, KGroupedStream, KTable, and KGroupedTable | Many methods have more than 6 overloads and it's hard for the users to understand which one to use. Furthermore, with the verbose generics, compiler errors might be confusing and not helpful if a parameter is specified wrong (ie, I want to use overlaod X, does the compiler pick the correct overlaod? and if yes, which parameter did I get wrong? and if no, which parameter do I need to change to the compiler picks the correct overload?) As we add more feature, this is getting more severe. | Change to Builder Pattern | high
|
Non consistent overloads | Some API have non-consistent overloaded methods that might be confusing to the user (why do I need to specify this for overload A, but not for overload B? – why does overload X allow me to do this, but not overload Y) | Relates to "Too many overlaods" – could be resolved with a clean builder abstraction. | medium
|
DSL limits access to records and/or record metadata | Some interface like Record metadata (like offset, timestamp, partition, topic) is not accessible in DSL interfaces. | Change interfaces, RichFunctions, Use process/transform | low
|
Missing public API | Some very helpful classes, that are currently in package | Move classes to different package. | low
|
Window(s) API |
| low | |
Improve StreamsConfig API | API is verbose and with intermixed consumer and producer configs hard to use correctly. | Builder pattern | medium
|
ProcessorContext to verbose | ProcessorContext give access to method that cannot be called. This is hard to reason about for users. | Split ProcessorContext and extract RecordContext | low
|
low-level API integration into DSL | Currently, low-level API is integrated into DSL via process()/transform() and transformValues(). Those abstraction are not perfectly defined and confusing to users. | Complete redesign | medium
|
Low-level API in DSL vs. "advanced DSL" | Currently, low-level API is used to empower the user to do anything within DSL. This approach is questionable to some extends. For example, if a user wants to do a stateful 1:1 transformation of records, she must implement | Major redesign | medium
|
Many of the above issues are related to each other and/or overlap. This, also reflects in a bunch of JIRAs that are all related to API changes:
- https://issues.apache.org/jira/browse/KAFKA-4125 (Rich Functions)
- https://issues.apache.org/jira/browse/KAFKA-3455 (valid?)
- https://issues.apache.org/jira/browse/KAFKA-4713 (ProcessorContext.init)
- https://issues.apache.org/jira/browse/KAFKA-4218 (add key to ValueTransformer – ie. mapValues and transformValues)
- https://issues.apache.org/jira/browse/KAFKA-4217 (add flatTransform() and flatTransformValues() – seem invalid to me)
- https://issues.apache.org/jira/browse/KAFKA-4346 (add foreachValue to KStream)
- https://issues.apache.org/jira/browse/KAFKA-3745 (add key to ValueJoiner)
- https://issues.apache.org/jira/browse/KAFKA-4726 (add key to ValueMapper)
Thus, to tackle this issue, it seems to be a good idea to break it down into groups of issues, and do a KIP per group to get a overall sound design.