Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Ordering: As noted in the JIRA chat, Samza has judged it to be impossible hard for the records to be returned in its original sequence by their implementation. 
  2. Exactly-Once: In exactly-once semantics, each offsets are returned once. This will not be possible if multiple threads are active (i.e. more than one thread is calling the same Kafka Streams instance).
  3. Latency and Resilience: Whenever we attempt to retry processing, as mentioned in the JIRA ticket, it could end up as a performance bottleneck because we are effectively "stopping the world" by pausing on a single record. An option to avoid this is to allow a second thread to handle these failed records while we continue to process incoming metadata. However, exactly once and ordering would not be supported under these conditions.

...

  1. Positive sides: Failure handling is better now in that multiple threads are on the job. While a secondary thread takes care of the failed metadata, the primary thread could move on processing new ones. Since the failed metadata topic's workload is not constantly increasing, we will have time to process them. Once the secondary thread has finished with the failed records, it could be terminated, thus freeing up CPU resources and space. Latency would be reduced.
  2. Negative sides: Ordering is now impossible harder to guarantee, as is and exactly-once is impossible because we have no way of knowing which records has been returned since asynchronous processes have no way of communicating between one another. . 

In the first approach I outlined, we are essentially giving the user some more flexibility in deciding how to resolve the latency and failure handling problem. The second approach takes some load off the client's back in that we figure out how to process the records using multiple threads, and clients doesn't have to worry about anything complex. Note that with the second plan, no CompletableFutures would be involved as secondary threads would be processing it directly using blocking methods (like KafkaConsumer#commitSync). 

...