Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Before going into the details let us first clarify the terms and their meanings for the purpose of this discussion.

TermDescription
Event Stream Time

KafkaStreams has defined an interface called TimestampExtractor which can be implemented to extract the event timestamp Event Time from the event contents, the ingestion time, or return the wallclock (system) the processing/wall-clock time, or any other logical time.

Event Stream Time for us would be the value returned by the TimestampExtractor implementation in use.

Punctuate TimeThis is the reference time used to trigger the Punctuate calls. It could be Event Stream Time [current semantics], System Time, or Event Stream Time with an expiry timer based on System Time (hybrid or complex punctuate).
Punctuation TimestampThe reference timestamp passed to punctuate method being called. It could be Event Stream Time of the event that triggered the punctuate [current semantics], System Time in case of System Time based punctuate, or a PunctuationTime object specifying the details in case of a hybrid/complex punctuate.
Output Record TimeThis is the output record timestamp for the events generated in the punctuate method. Currently, it is the internal time of the Stream Task which is the Event Stream Time.

Punctuate Use Cases

Use Case 1

...

 

BehaviourRequirement
Deterministic output
(reprocessing or delayed processing possible)
(tick)
Output in the absence of events(tick)

Evaluation Matrix

In the current currently listed use cases the Event Stream time is the Event timestampTime.

 

Behaviour

Event Stream Time based Punctuate

System Time based Punctuate Event Stream Time based Punctuate with
System Time based secondary expiry
Deterministic output
(reprocessing or delayed processing possible)
(tick)(error)(tick)
Output in the absence of events(error)(tick)(tick)
Join of same time events across different topics(tick)(error)(warning)
Correct punctuation in the presence of long queueing(tick)(error)(warning)
    

Alternate Representation

Design Semantics Use Case 1Use Case 2

Event Stream Time based Punctuate

Pros

Deterministic output (reprocessing or delayed processing doesn't change the output)

Deterministic output (reprocessing or delayed processing doesn't change the output)
Cons

Low event geographic areas might not generate a list every 10 minutes. 

Event count output wont happen until an event with event time crossing the punctuate interval arrives.

System Time based Punctuate 

ProsRank list output every 10 minutes, irrespective of event flow.Count flushes in the absence of events 
ConsOutput during a reprocessing won't be same.

Late event will cause another aggregate event during steady state operation.
Burst output of aggregate events at the punctuate times during reprocessing.

Event Stream Time based Punctuate with System Time based secondary expiryProsDeterministic output (reprocessing or delayed processing doesn't change the output)
Low event geographic areas will generate a list as per the expiry timer.

Deterministic output (reprocessing or delayed processing doesn't change the output)
Count flushes in the absence of events 

Cons?

In case of similar scenarios where processing time is more leading to queueing and late processing, the system timer may expire before the last even within the window arrives.
In case of similar scenarios where events have to be joined across different topics, we might actually need to wait for the event.