Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current stateUnder Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Streams has grown to augment KTables (e.g., changelogs) with state stores. These state stores can be seen as a way to materialize the KTables. State stores can be queried through the Interactive Queries (IQ for short) APIs, as introduced in KIP-67. This led to some confusions and inconsistencies, e.g., :

...

  • KTables resulting from join() operators are not materialized, because the user does not have the option of materializing them. Thus they cannot be queried.

  • Certain operations on KTables, such as print() and foreach(), are often confusing and redundant, since users can already print the values of a materialized view. Furthermore, users also have the option to use the KStream equivalent functions after converting a KTable to a KStream. So they have 3 ways of printing and iterating through a KTable.

Proposal

We propose to clean up the KTable API and make the KTable semantics clearer and consistent through API improvements and associated JavaDoc improvements. In a nutshell, if a user specifies a state store name, then the user can also subsequently query that state store with that name. 

What is in scope

The main scope of this KIP is to address the inconsistency in which KTables are materialized and what KTables are notcan be queried and which KTables cannot. As well as which KTables are queryable and which are nothow a user goes about making that decision. As such, this KIP should be seen as an incremental update to the existing APIs, not a complete overhaul. 

What is in scope is the exact API for addressing the above inconsistency.

What is not in scope

  • Revisiting the interactive queries APIs is not in scope. Specifically, what is not in scope is re-defining the exact boundary between the DSL (i.e., the processing language) and the storage/interactive queries, and how we jump from one to the other. The boundary will remain as it is today, where to do Interactive Queries, the user needs a store name and receives a store to query based on that name. We can address that in a later KIP if required.
  • What is not in scope is rethinking the DSL itself. Specifically, specifying state stores in the API can be thought of as a type of hint to the DSL to indicate that materialization is required. There could be many such hints, and perhaps they could be described with methods such as .materialize(), or .cache(), or .log(). These methods might be getting us towards a less declarative API. Either way, it is not in the scope of this KIP to undertake a complete rethink of the DSL. This KIP stays consistent with the DSL we currently have.

Interfaces / API Changes (DSL only)

2 overloads for all APIs that create KTables: with and without store name

All API that create KTables will have 2 overloaded methods, one with the store name, and one without. Note that a null store name does not indicate that a KTable would not be materialized, but only that it will not be used for interactive queries. An internal name will be generated in cases materialization must happen.

...

  • KTable<K, Long> count(final String storeName);
  • KTable<K, V> reduce(final Reducer<V> reducer);
  • <VR> KTable<K, VR> aggregate(final Initializer<VR> initializer, final Aggregator<? super K, ? super V, VR> aggregator, final Serde<VR> aggValueSerde);

Remove unnecessary methods

  • Depreciate the following KTable methods: print(), writeAsText(), foreach() and any of their overloads.

Implementation plan

One implementation detail that is important is how the Kafka Streams internals decides whether to materialize a KTable. Note that the above APIs provide a way for Interactive Queries to query a state store. They do not dictate whether the state store itself if a real, materialized one, or a view on top of a real, materialized store. Going back to the first example in the motivation:

...

  • Will depreciate the KTable methods print(), writeAsText(), foreach() and any of their overloads. 
  • Code will be backwards compatible.

Rejected Alternatives

Change the name of KTable. We discussed potentially changing the name of a KTable to something like KChangelogStream, but this doesn’t solve the main problem this KIP addresses, which is API inconsistency.

...