Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We propose to clean up the KTable API and make the KTable semantics clearer and consistent through API improvements and associated JavaDoc improvements.

In a nutshell , if a user specifies a state store name, then the user can also subsequently query that state store with that name. the approach is as follows:

  • Decouple the notion of materialization from the notion of querying. Materialization is an internal streams decision. Querying is a user-facing decision.
  • We will overload each method that creates a KTable with a store name. If the user provides that name, the user can subsequently query that store with that provided name.
  • The above guarantee says that a user can query the store. It doesn't say anything about how we implement that feature internally. We could be materializing the store (e.g., backing it with a RocksDb store). Or we could be providing a read-only view of the store (e.g., by computing the result on the fly). The filter example above illustrates the two options:

                                                               KTable table2 = table1.filter() <----- user does not provide a name, no guarantee table2 is queryable

                                                               KTable table2 = table1.filter("filterStoreName")      <----- user provides a name, we guarantee table2 is queryable based on that name. Internally we could be writing each filtered value to a RocksDb store, or computing the filter result on the fly each time the store is queried.


What is in scope

The main scope of this KIP is to address the inconsistency in which KTables can be queried and which KTables cannot. As well as how a user goes about making that decision. As such, this KIP should be seen as an incremental update to the existing APIs, not a complete overhaul. 

What is in scope is the exact API for addressing the above inconsistency.

What is not in scope

  • Revisiting the interactive queries APIs is not in scope. Specifically, what is not in scope is re-defining the exact boundary between the DSL (i.e., the processing language) and the storage/interactive queries, and how we jump from one to the other. The boundary will remain as it is today, where to do Interactive Queries, the user needs a store name and receives a store to query based on that name. We can address that in a later KIP if required.
  • What is not in scope is rethinking the DSL itself. Specifically, specifying state stores in the API can be thought of as a type of hint to the DSL to indicate that materialization is required. There could be many such hints, and perhaps they could be described with methods such as .materialize(), or .cache(), or .log(). These methods might be getting us towards a less declarative API. Either way, it is not in the scope of this KIP to undertake a complete rethink of the DSL. This KIP stays consistent with the DSL we currently have.

...