Status

Current stateAccepted

Discussion thread: here 

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Streams has grown to augment KTables (e.g., changelogs) with state stores. These state stores can be seen as a way to materialize the KTables. State stores can be queried through the Interactive Queries (IQ for short) APIs, as introduced in KIP-67. This led to some confusions and inconsistencies, e.g., :

     KTable table1 = builder.table(..., topic, stateStoreName); // materialized
     KTable table2 = table1.filter(...); // not materialized, and not queryable by IQ

Proposal

We propose to clean up the KTable API and make the KTable semantics clearer and consistent through API improvements and associated JavaDoc improvements.

In a nutshell the approach is as follows:

                                                               KTable table2 = table1.filter() <----- user does not provide a name, no guarantee table2 is queryable

                                                               KTable table2 = table1.filter("filterStoreName")      <----- user provides a name, we guarantee table2 is queryable based on that name. Internally we could be writing each filtered value to a RocksDb store, or computing the filter result on the fly each time the store is queried.


What is in scope

The main scope of this KIP is to address the inconsistency in which KTables can be queried and which KTables cannot. As well as how a user goes about making that decision. As such, this KIP should be seen as an incremental update to the existing APIs, not a complete overhaul. 

What is in scope is the exact API for addressing the above inconsistency.

What is not in scope

Interfaces / API Changes (DSL only)

3 overloads for all APIs that create KTables: with and without store name. The alternative to store name is a StateStoreSupplier.

All API that create KTables will have 3 overloaded methods, one with the store name or with a StateStoreSupplier, and one without. Note that providing a null store name is the same as using the API with no store name.

These APIs include the ones below and any of their existing overloads. We do not list the overloads here to keep the list uncluttered. Each API will have one version with no store name, and one version with a store name and one version with a StateStoreSupplier (that contains the state store name).

In KTable.java overload each of the following APIs by adding store name and StateStoreSupplier (when they don't exist already):

In KStreamBuilder.java overload each of the following APIs by adding store name and StateStoreSupplier (when they don't exist already):

In KGroupedTable.java overload each of the following APIs by adding store name and StateStoreSupplier  (when they don't exist already):

In KGroupedStream.java overload each of the following APIs by adding store name and StateStoreSupplier  (when they don't exist already):

Remove unnecessary methods

Interfaces / API Changes (PAPI only)

During implementation it became clear that a minor adjustment was needed and org.apache.kafka.streams.processor.addGlobalStore should take a 

Misc API cleanup

During implementation it became apparent that some APIs needed minor renaming or cleanup. That is listed here:

Implementation plan

One implementation detail that is important is how the Kafka Streams internals decides whether to materialize a KTable. Note that the above APIs provide a way for Interactive Queries to query a state store. They do not dictate whether the state store itself if a real, materialized one, or a view on top of a real, materialized store. Going back to the first example in the motivation:

KTable table1 = builder.table(..., topic, stateStoreName); // materialized
KTable table2 = table1.filter(final String stateStoreName2)

The store with name "stateStoreName2" could be a view on top of the store with name "stateStoreName", in which case each time we query stateStoreName2, we would, on the fly, compute the result of the filter on values actually stored in the first store. Alternatively, "stateStoreName2" could in itself be materialised, i.e., contain all the results obtained from the filtering. Materialising all Ktables with a state store name could be expensive, however it is a straightforward to implement and could be a good V1. A subsequent JIRA could do an optimization in V2.

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Change the name of KTable. We discussed potentially changing the name of a KTable to something like KChangelogStream, but this doesn’t solve the main problem this KIP addresses, which is API inconsistency.

Use .materialize(String storeName) instead of overloads. In this proposal, we remove all state store parameters in KTable methods, and add a .materialize() method on the KTable. This gets us to a place where we are forced to make the DSL less declarative (see “What is not in scope” above) and forces us to have an orthogonal discussion to this KIP.

Have the KTable be the materialized view. In this alternative, the KTable would no longer be a changelog stream, it would be a materialized view. So we would collapse two things (the existing KTable and state store) into one abstraction. All KTable methods would need to take a state store name.