Status

Current state: Drafting

Discussion thread: TBD

JIRA: TBD

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Streams supports an interesting and innovative API for "peeking" into the internal state of running stateful stream processors from outside of the application, called Interactive Query (IQ). This functionality has proven invaluable to users over the years for everything from debugging running applications to serving low latency queries straight from the Streams runtime.

However, the actual interfaces for IQ were designed in the very early days of Kafka Streams, before the project had gained significant adoption, and in the absence of much precedent for this kind of API in peer projects. With the benefit of hindsight, we can observe several problems with the original design that we hope to address in a revised framework that will serve Streams users well for many years to come.

Problems

IQ serves queries by presenting callers with a composite store interface, which encapsulates the fact that stores will in general be partitioned, and that a given instance may only host a subset of those partitions (if any).
1. The cost of constructing this store interface is non-trivial, though many real-world use cases will only use the store to run one query and then discard it.
2. The creation of the store is subject to any number of error conditions, so callers need to handle exceptions on calling KafkaStreams.store() .
3. Once you have an IQ store reference, it is still subject to any number of transient and permanent conditions (such as rebalances having moved partitions off the local instance, Streams changing state, etc.), so callers also need to handle exceptions while running queries on their store, and be prepared to rebuild the store if it becomes invalid.
4. Users desiring to query custom state stores need to produce a fairly detailed implementation of QueryableStoreType that details how to compose results from multiple partitions into one.
  1. In particular, if you want to plug a store with special query capabilities in to the Streams DSL (for example as the materialization of a KTable), the store must extend the appropriate store interface, for example KeyValueStore<Bytes,byte[]> . However, when it comes to exposing it in IQ, you will have to implement QueryableStoreType<YourStore> , and it requires you to produce an instance of YourStore that delegates to a list of ReadOnlyKeyValueStore<Bytes,byte[]> returned by storeProvider.stores . However, the catch is that those delegate stores are not instances of your store type! They will be instances of the internal class MeteredTimestampedKeyValueStore , which wraps other internal store types, which after a few more layers of wrapping contain YourStore at the bottom. Your only choice will be to create a completely separate implementation of the YourStore interface that delegates all the ReadOnlyKeyValueStore methods to those wrapper layers, and then for the methods that are special to YourStore, you'll have to repeatedly cast and type-check the wrappers until you manage to get all the way down to the actual YourStore that can serve the query.
  2. In other words, it's so complicated that it might as well be impossible. Which makes it not surprising that no one has actually tried. I suspect that if we made it easier to extend the storage framework, we would see a bunch of new use cases pop up building on IQ in the future.
IQ composes all locally present partitions into a unified response. For example, for queries that return an iterator, it builds a composite iterator that collates all locally available partitions' iterators into one.
1. While this is useful for trivial use cases, it destroys useful information about the response:
  1. Callers don't know which partitions were included in the response.
  2. After iterating for some time, callers can't tell when individual partitions' iterations are complete. This is important if we experience a failure: partitions that are already complete don't need to repeat the query.
2. In practice, partitions' responses can succeed or fail independently, but the composite response couples all responses to any individual partition's failure.
Because IQ responses are simply the result type of whatever store method you invoke, it is not possible to attach extra metadata that is useful in the IQ context, specifically.
1. Eg. We might want to add detailed trace information about which store layers or segments participated in the query, including execution time, cache hit/miss, etc. This kind of feature would be particularly useful when debugging performance, or when IQ is backing a service that uses distributed tracing, etc.
2. Eg. We might want to add information about the precise state of the view we served: what was the input offset we last wrote into the store? What is the "current stream time" of the view we served? What was the state of the StreamTask when we served the query? Etc.
3. These are just examples from various conversations about potentially useful IQ response metadata. The point is to illustrate the fact that we are missing opportunities by restricting the IQ response to be the simple value returned by the store methods that serve the query.
Supporting new types of queries to the "standard" store interfaces is onerous, since it requires adding new methods to the store interfaces, which need to be overridden in dozens of utility implementations throughout the Streams codebase.
1. Example: KIP-617: Allow Kafka Streams State Stores to be iterated backwards . This change involved four PRs (https://github.com/apache/kafka/pull/9137, https://github.com/apache/kafka/pull/9138, https://github.com/apache/kafka/pull/9139/files, https://github.com/apache/kafka/pull/9239), totaling 108 files and 6,000+ lines of code changed.
2. Another example: KIP-614: Add Prefix Scan support for State Stores (which only edits the KeyValueStore). This change took two PRs (https://github.com/apache/kafka/pull/9508 and https://github.com/apache/kafka/pull/10052), totaling 19 files and 600+ lines of code changed.
IQ forces all queries to compose the contents of the Record Cache (the write buffer for Processors) with the underlying bytes stores.
1. Despite its name, the Record Cache is a write buffer, not a traditional read cache. Perhaps not surprisingly, its performance is not very good for arbitrary queries, since its primary purpose is to ensure that Processors always read their own writes while delaying operations like sending writes to the changelog and the underlying stores.
2. We could invest in optimizing the Record Cache for queries, but we would probably find that the better approach is to separate the read and write workloads.
3. Regardless of potential future optimizations in the Record Cache, merging the buffered writes with the underlying stores (as in the case of a table scan) will always require extra work, and it would be good to have an option to skip the record cache for IQ users.
4. In contrast to Processors, IQ interactions can never write, so do not need any concept of "read your writes".

Unintended Consequences

In a nutshell, all those issues result in a system that isn't ideal from anyone's perspective:

People using IQ:
1. Have to call two methods to do one query (KafkaStreams#store and then the actual query method) and have to deal with exceptions from both of those methods
2. Lose important information about which partitions were included in the response, and when individual partitions complete during the process of consuming results
3. Get worse performance than necessary due to the overhead of building the intermediate store abstraction
People adding new stores:
1. Have to implement prohibitively complex logic to expose their store's capabilities through IQ (see Problem 1d).
People contributing to existing store interfaces:
1. Have to jump through a bunch of hoops to add a new method to the store interfaces.
2. Have no way to know they did everything right unless they test every combination of store configurations with both the PAPI and IQ
People maintaining Streams:
1. Have a significant burden reviewing KIPs and PRs because there are so many complexities involved in properly changing store interfaces
2. Have to deal with a long-tail of bug reports that trickle in when some contribution inevitably overlooks some "minor" point like verifying a new method works via IQ or is properly handled in the cache, etc.

In conclusion, and to clarify: IQ itself has been extremely successful and valuable to many people. I only harp on the above points to demonstrate a number of flaws that I think we can improve on to make it even more valuable and successful in the future.

Goals

To address the above pain points while preserving the positive aspects of IQ, I'd propose the following goals for this KIP:

We should continue to offer a mechanism to peek into the internal state of Kafka Streams's stateful operations.
We should recognize that querying state via IQ is a different use case from using a store in a Processor, and a different interface should therefore be on the table.
Simple use cases should continue to be easy.
1. Particularly, it should continue to be easy to just "query the store" and not worry about individual partitions, etc.
More complex use cases should be possible and not too hard.
1. Particularly, it should be possible to pick and choose partitions to query, and to handle independent partitions' responses independently.
2. It should also be possible to define new store types and queries with a manageable level of complexity.
3. It should be possible to tune queries for maximum performance.
Contributing to and maintaining the code base should be relatively straightforward.

Proposed Changes

This KIP proposes a new framework for IQ, which we will refer to as "IQv2". It is outside the scope of this KIP to propose new mechanisms for every query that IQ supports today, as part of the purpose of this design is to be easily extensible (and we want to keep the KIP focused on the framework). However, we do propose to add a few specific queries, to flesh out the proposal and to give some purpose to the initial batch of implementation work.

The basic design of IQv2 is to add a mechanism for KafkaStreams (and TopologyTestDriver, which we'll omit for brevity in the discussion) to execute a "query" on the caller's behalf (as opposed to constructing a store for the caller to query).

This addresses Problem 1 (and Unintended Consequence 1a) because each time a user wants to query the store, they just call one method and have no store lifecycle to maintain.

The query itself will be (almost) completely opaque to KafkaStreams, and will effectively be a protocol between the IQv2 caller and the underlying state store.

This is the key to addressing Problem 4, and it resolves Unintended Consequence 2 (because new stores don't need to do anything except handle queries to be integrated with IQ) and Unintended Consequence 3 (because the scope of a new capability is only limited to adding a new Query type and adding handlers for it in the desired store). It also resolves Unintended Consequence 4 for the same reason as 3, since the scope of adding a new query is so much smaller.
This design also addresses Problem 5 because the Caching state store layers will have the opportunity to handle known queries before passing them down to lower store layers. So, if desired, we can define a well-known KeyQuery that has a flag controlling whether the cache should handle it or not, while a custom query type would naturally bypass the cache, since the cache doesn't have knowledge of the query type.

IQv2 will include "container" request and response objects, enabling refinements and controls to be added onto queries and also enabling additional metadata to be accompany results.

This addresses Problem 3 because we can attach all the extra information we need "around" the core query and result.
It also creates a mechanism for future extensions to IQ

The response object will not attempt to compose individual partitions' responses. Instead, the response object will provide an API to get the responses for each partition. Additionally, we will provide some utilities to compose partition responses.

This addresses Problem 2 because responses aren't required to be composable, and it also creates room for partitions to report successful or failure responses independently.

Public Interfaces

This KIP will add ...:

...

Compatibility, Deprecation, and Migration Plan

...

Rejected Alternatives

...

Space shortcuts

Child pages

Status

Motivation

Problems

Unintended Consequences

Goals

Proposed Changes

Public Interfaces

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Space shortcuts

Child pages

KIP-DRAFT: Interactive Query v2

Status

Motivation

Problems

Unintended Consequences

Goals

Proposed Changes

Public Interfaces

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives