Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It would be possible to apply retries on a per method level (ie, for each client method that is called, an individual retry counter is maintained). This proposal is rejected because it seems to be too fine grained and hard to reason about for users.
  • If would be possible to apply retries at the thread level, ie, whenever the thread does not make any progress in one task-processing-loop (ie, all tasks throw a timeout exception within the loop), the per-thread retry counter would be increased. This proposal is rejected as too coarse grained. In particular, a single task could get stuck while other tasks make progress and this case would not be detected.
  • To distinguish between retries within Kafka Streams and client retries (in particular the producer's send retries config), we could add a new config (eg, `task.retries`). However, keeping the number of config small is desirable and the gain of the new config seems limited.
  • To avoid that people need to consider setting producer.retries and admin.retires explicitly, we could change the behavior of Kafka Streams and use retries expliclity for Streams level retries. For this case, setting retries would not affect the producer or admin client and both retries could only be change with ther corresponding client-prefix config. This would be a backward incompatible change and in fact, it might be better moving forward to deprecate the producer and admin client retries config in favor of their newer timeout configs (this was already pointed out in KIP-533).
  • Instead of using a retry counter, it would be possible to use a new `task.progress.timeout.ms` config (this might align with the recent API changes of the underlying clients). If was rejected as we have already a retry config that we can simply reuse.
  • Instead of applying retry.backoff.ms config a task would be retries in the next processing loop directly. In contrast to a "busy wait" retry as done in the clients and on the global thread, looping over all other tasks implies some retry delay natively. However, it seems to align better to existing behavior/semantics to apply retry.backoff.ms config (note, that this backoff time might actually be exceeded naturally as looping through all the other tasks might take longer).

...