Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Motivation

A common use case for Kafka is real-time processes that transform data from input topics to output topics. Today there are a couple of options available for users to process such data:

...

The second option, i.e. using a full stream processing framework can be a good solution but a couple of things tend to make it a bit heavy-weight:

1. These frameworks are poorly integrated with Kafka (different concepts, configuration, monitoring, terminology). For example, these frameworks only use Kafka as its stream data source / sink of the whole processing topology, while using their own in-memory format for storing intermediate data (RDD, Bolt memory map, etc). If users want to persist these intermediate results to Kafka as well, they need to break their processing into multiple topologies that need to be deployed separately, increasing operation and management costs.

2. These frameworks either duplicate or force the adoption of a packaging, deployment, and clustering solution. For example, in Storm you need to run a Storm cluster which is a separate thing that has to be monitored and operated. In an elastic environment like AWS, Mesos, YARN, etc this is sort of silly since then you have a Storm cluster inside the YARN cluster vs just directly running the jobs in Mesos; similarly Samza is tied up with YARN. 

3. These frameworks can't be integrated with existing services or applications. For example, you can't just embed a light transformation library inside an existing app, but rather the entire framework that runs as a service.

Proposed Changes

We want to propose another transformation-centric client besides the existing producer and consumer clients (we can call it the "processor" client)This client can be used as a standalone API for processing stream data stored in Kafka.

...

Code Block
languagejava
class MyKafkaProcessor implements KafkaProcessor {

  @overide
  void process(String /* topic */, K /* key */, V /* value */, Context) { .. }
}
 
// starting a new thread running the user defined processing logic
KafkaProcessor processor = new KafkaProcessor(configs);
KStreamThread thread = new KStreamThread(processor);
thread.start();
 
// stopping the stream thread
thread.shutdown();

 

 

 

Design Goals

The Processor client should include the following functionalities:

...

We don't have a full fledged proposal for all the implementation details yet; it will need some prototyping to work out. This KIP is meant as a stub to start the conversation, it will be updated as we have a more complete prototype to show.

 

 

Compatibility, Deprecation, and Migration Plan

This KIP only proposes additions. There should be no compatibility issues.

...