...
- We should never try to handle any fatal exceptions but clean up and shutdown
- We should catch all those exceptions for cleanup clean up only and rethrow unmodified (they will eventually bubble out of the thread and trigger uncaught exception hander if one is registered)
- We should only log those exception (with
ERROR
level) once at the thread level before they bubble out of the thread to avoid duplicate logging
- We need to do fine grained exception handling, ie, catch exceptions individually instead of coarse grained and react accordingly
- All methods should have complete JavaDocs about exception they might throw
- All exception classes must have strictly defined semantics that are documented in their JavaDocs
- For retriable exception, we should throw a special
StreamsException
if reties are exhausted - In runtime code, we should never throw any regular Java excepiton but define our own exceptions if required (this allows us to destinguish between bugs and our own exceptions)
- We should catch, wrap, and rethrow exceptions each time we can add important information to it that helps users and us to figure out the root cause of what when wrong
To be discussed:
- How to handle
Throwable
?- Should we try to catch-and-rethrow in order to clean up?
Throwable
is fatal, so clean up might fail anyway?- Furthermore, should we assume that the whole JVM is dying anyway?
- Should we be harsh and call
System.exit
(note, we are a library – but maybe we are "special" enough to justify this?- Note, if a thread dies without clean up, but other threads are still running fine, we might end up in a deadlock as locks are not released
- Could also be configurable
- Could also be a hybrid: try to clean up on
Throwable
but callSystem.exit
if clean up fails (as we would end up in a deadlock anyway – maybe only if running with more than one thread?)
- Should we force users to provide uncaught exception handler via
KafkaStreams
constructor to make sure they get notified about dying streams?
- Should we try to catch-and-rethrow in order to clean up?
- Restructure exception class hierarchy:
- Remove all sub-classed of
StreamsException
from public API (we only hand out this one to the user) - A
StreamsException
inidicates a fatal error (we could sub-classStreamsException
with more detailed fatal errors if required – but don't think this is necessary) - We sub-class
StreamsException
with (an abstract?)RecoverableStreamsException
in internal package for any internal exception that should be handled by Streams and never bubble out- As an alternative (that I would prefer) we could introduce this as an independet and checked exception instead of inheriting from
StreamsException
(this forces us to declare and handle those exceptions in our code and makes it hart do miss – otherwise, one might bubble out due to a bug
- As an alternative (that I would prefer) we could introduce this as an independet and checked exception instead of inheriting from
- We sub-class inidividual recoverable exceptions in a fine grained manner from
RecoverableStreamsException
for individual errors - We can further group all retriable exceptions by sub-classing them from
abstract RetriableStreamsException extends RecoverableStreamsException
– the more details/categories the better?
- Remove all sub-classed of
KafkaConsumer | KafkaProducer | StreamsKafakClient | AdminClient | Streams API | |
---|---|---|---|---|---|
fatal (should never occur) | local: - IllegalArgumentExcetpion - IllegalStateException - WakeupException - InterruptExcetpion remote: - UnknownServerException | local: - IllegalArgumentExcetpion - IllegalStateException - WakeupException - InterruptExcetpion remote: - UnknownServerException - OffsetMetadataTooLarge - SerializationException (we use
| local: - IllegalArgumentExcetpion - IllegalStateException - WakeupException - InterruptExcetpion remote: - UnknownServerException - InvalidTopicException | local: - IllegalArgumentExcetpion - IllegalStateException - WakeupException - InterruptExcetpion remote: - UnknownServerException - InvalidTopicExcetpion | local: |
fatal | local: - ConfigException remote: - AuthorizationException (including all subclasses) - AuthenticationException (inlcuding all subclasses) - SecurityDisabledException - InvalidTopicException
| local: - ConfigException remote: - AuthorizationException (including all subclasses) - AuthenticationException (inlcuding all subclasses) - SecurityDisabledExcetpion - InvalidTopicException - UnkownTopicOrPartitionsException (retyable? refresh metadata?) - RecordBatchTooLargeException - RecordTooLargeException
| local: - ConfigException remote: - AuthorizationException (including all subclasses) - AuthenticationException (inlcuding all subclasses) - SecurityDisabledExcetpion
| local: - ConfigException remote: - AuthorizationException (including all subclasses) - AuthenticationException (inlcuding all subclasses) - SecurityDisabledExcetpion | local: - ConfigException - SerializationException |
retriable | local:
remote: - InvalidOffsetException (OffsetOutOfRangeException, NoOffsetForPartitionsException) - CommitFailedException - TimeoutException - QuotaViolationException? | local: remote: - CorruptedRecordException - NotEnoughReplicasAfterAppendException - OffsetOutOfRangeException (when can producer get this?) - TimeoutException - QuotaViolationException? - BufferExhausedException (verify) | local:
remote: | local:
remote: | local: |
recoverable | local:
remote:
| local:
remote: - ProducerFencedException | local:
remote: | local:
remote: | local: - LockException |
...