Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Motivation

We have faced the following scenario/problem in a lot of situations with KStreams:
   - Huge incoming data being processed by numerous application instances
   - Need to aggregate different fields whose records span all topic partitions (something like “total amount spent by people aged > 30 yrs” when processing a topic partitioned by userid).

...

We started this discussion with Matthias J. Sax here: https://issues.apache.org/jira/browse/KAFKA-6953

Public Interfaces

     [existing class] org.apache.kafka.streams.StreamBuilder

     [added function] KTable<K, V> scheduledTable(String storeName, String scheduleExpression, boolean allInstances

Proposed Changes

Create a new DSL Source Type with the following characteristics:

...

Differently from current DSL Source Types, it is not a new incoming message from a Kafka Topic that sources a Graph. The Graph is sourced when time is elapsed, and the messages are generated based on an existing KTable. If the KTable is empty, no messages are sourced. If it has 100 elements, for example, everytime time is elapsed, the Graph is sourced with 100 elements, even if nothing has changed to the KTable.

Compatibility, Deprecation, and Migration Plan

  • There won’t be any compatibility issues because this is a brand new feature/interface

Rejected Alternatives

   1- Sink all instances’ local aggregation result to a Topic with a single partition so that we could have another Graph with a single instance that could aggregate all results
         - In this case, if I had 500 instances processing 1000 messages/s each (with no bottlenecks), I would have a single partition topic with 500k messages/s for my single aggregating instance to process that much messages (great bottleneck)

...