Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Furthermore, we propose to catch all client TimeoutException in Kafka Streams instead of treating them as fatal, and thus to not rely on the consumer/producer/admin client to handle all such errors. If a TimeoutException occursTimeoutException occurs, we skip the current task and move to the next task for processing (we will also log a WARNING for this case to give people inside which client call did produce the timeout exception). The failed task would automatically be retired in the next processing loop. Because other tasks a processed until a task is retried, we don't have to worry about a busy wait situation. Even if a thread would have only a single task, the clients internal exponential retries would avoid busy waiting.

To make sure that timeout issues can be reported eventually, we use a new task.timeout.ms config to allow user to stop processing at some point if a single task cannot make any progress. The "timer" for task.timeout.ms starts when the first client TimeoutException is detected and is reset/disabled if a task processes records successfully in a retry. If task.timeout.ms passed, a final attempt will be made to make progress . Note that this config does (this strategy ensures that a task will be retried at least once; except task.timeout.ms is set to 0, what implies zero retries); if another client TimeoutException occurs, processing is stopped by re-throwing it and the streams-thread dies. Note that the task.timeout.ms config does only apply if a previous TimeoutException occurred occurred. During normal, potentially slow processing, task.timeout.ms would not be applied. In particular, if a task hits a TimeoutException (and task.timeout.ms is not set to 0) the task will be retried at least once.

To replace retries in the global thread's initialization phase, we also retry TimeoutException until task.timeout.ms expires. We apply existing retry.backoff.ms config and rely on the client to do exponential backoff and retry for this case.

...