Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

     c. Check if user calls commit() during the processing of this records; if yes commit the offset / flush the local state / flush the producer.

 

Packaging Design

It would be best to package Processor / KStream as a separate jar, since it introduces extra external dependencies, such as RocksDB, etc. Under this model:

1. We will let users to create their own MyKStream.java class that depends on the kafka-stream.jar.

2. We will let users to write their own Main function as the entry point for starting their process instance.

 

Current class / package names can be found in [LINK]. A general summary:

1. All classes are defined in the "stream" folder.

2. Low-level Processor interface is under the "o.a.k.clients.processor" package; high-level KStream interface is under the "o.a.k.stream" package.

3. Important user-facing classes include:

Code Block
KafkaProcessor: implements Processor, Receiver, Punctuator; used for computation logic.
 
ProcessorContext: passed in KafkaProcessor.init(); provides schedule / send / commit / etc functions, and topic / partition / offset / etc source record metadata.

StateStore: can be created inside KafkaProcessor.init() for storing local state.
PTopology: requires users to implement the build() function, in which addProcessor / addSource can be used to construct the DAG.
 
KStreamTopology: extends PTopology, and in its build() function high-level operators like map / filter / branch / etc can be used.

KStreamProcess: used in main function to take provided Topology class and configs to start the instance.

Some example classes can be found in o.a.k.stream.examples.

4. Important internal classes include:

Code Block
Ingestor: the wrapped consumer instance for fetching data / managing offsets.
 
KStreamThread: multi-threaded KStreamProcess will create #.KStreamThread specified in configs, each maintaining its own Ingestor.
 
StreamGroup: the unit of processing tasks that are assigned to KStreamThread within the KStreamProcess instance.
 
KStreamFilter/Map/Branch/...: implementations of high-level KStream topology builder operators.

 

 

Shutdown

Upon user calling KafkaProcess.shutdown(), the following steps are executed:

...