...
Before going into the details let us first clarify the terms and their meanings for the purpose of this discussion.
Term | Description |
---|---|
Event Stream Time | KafkaStreams has defined an interface called TimestampExtractor which can be implemented to extract the event timestamp Event Time from the event contents, the ingestion time, or return the wallclock (system) the processing/wall-clock time, or any other logical time. Event Stream Time for us would be the value returned by the TimestampExtractor implementation in use. |
Punctuate Time | This is the reference time used to trigger the Punctuate calls. It could be Event Stream Time [current semantics], System Time, or Event Stream Time with an expiry timer based on System Time (hybrid or complex punctuate). |
Punctuation Timestamp | The reference timestamp passed to punctuate method being called. It could be Event Stream Time of the event that triggered the punctuate [current semantics], System Time in case of System Time based punctuate, or a PunctuationTime object specifying the details in case of a hybrid/complex punctuate. |
Output Record Time | This is the output record timestamp for the events generated in the punctuate method. Currently, it is the internal time of the Stream Task which is the Event Stream Time. |
Punctuate Use Cases
Use Case 1
...
Behaviour | Requirement |
---|---|
Deterministic output (reprocessing or delayed processing possible) | |
Output in the absence of events |
Evaluation Matrix
In the current currently listed use cases the Event Stream time is the Event timestampTime.
Behaviour | Event Stream Time based Punctuate | System Time based Punctuate | Event Stream Time based Punctuate with System Time based secondary expiry |
---|---|---|---|
Deterministic output (reprocessing or delayed processing possible) | |||
Output in the absence of events | |||
Join of same time events across different topics | |||
Correct punctuation in the presence of long queueing | |||
Alternate Representation
Design Semantics | Use Case 1 | Use Case 2 | |
---|---|---|---|
Event Stream Time based Punctuate | Pros | Deterministic output (reprocessing or delayed processing doesn't change the output) | Deterministic output (reprocessing or delayed processing doesn't change the output) |
Cons | Low event geographic areas might not generate a list every 10 minutes. | Event count output wont happen until an event with event time crossing the punctuate interval arrives. | |
System Time based Punctuate | Pros | Rank list output every 10 minutes, irrespective of event flow. | Count flushes in the absence of events |
Cons | Output during a reprocessing won't be same. | Late event will cause another aggregate event during steady state operation. | |
Event Stream Time based Punctuate with System Time based secondary expiry | Pros | Deterministic output (reprocessing or delayed processing doesn't change the output) Low event geographic areas will generate a list as per the expiry timer. | Deterministic output (reprocessing or delayed processing doesn't change the output) |
Cons | ? | In case of similar scenarios where processing time is more leading to queueing and late processing, the system timer may expire before the last even within the window arrives. | |