Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

First, we can distinguish between retryable and non-retriable (ie, fatal) recoverable and fatal exceptions. Retryable Recoverable exception should be handled internally and only never bubble out if a (configuare) amount of retries was unsuccessful. For non-retryable/to the user. For fatal exceptions, Kafka Streams is doomed to fail and cannot start/continue to process data

Related to this are retriable exception. While retriable exception are recoverable in general, it might happen that the (configurable) retry counter is exceeded; for this case, we end up with an fatal exception.

The second category are "external" vs "internal" exception. By "external" we refer to any exception that could be returned by the brokers. "Internal" exceptions are those that are raised locally.

...

  • We should never try to handle any fatal exceptions but clean up and shutdown.
    • We should catch all those exceptions (all Java exception, generic KafakException, a for cleanup only and rethrow unmodified (they will eventually bubble out of the thread and trigger uncaught exception hander if one is registered)

...

    • We should only log those exception (with ERROR level) once at the thread level before they bubble out of the thread to avoid duplicate logging
  • We need to do fine grained exception handling, ie, catch exceptions individually instead of coarse grained and react accordingly
  • All methods should have complete JavaDocs about exception they might throw
  • All exception classes must have strictly defined semantics that are documented in their JavaDocs
  • For retriable exception, we should throw a special StreamsException if reties are exhausted

 

 KafkaConsumerKafkaProducerStreamsKafakClientAdminClientStreams API
fatal (should never occur)

local:

- IllegalArgumentExcetpion

- IllegalStateException

- WakeupException

- InterruptExcetpion

remote:

- UnknownServerException

local:

- IllegalArgumentExcetpion

- IllegalStateException

- WakeupException

- InterruptExcetpion

remote:

- UnknownServerException

- OffsetMetadataTooLarge

- SerializationException (we use <byte[],byte[]> as types)

 

local:

- IllegalArgumentExcetpion

- IllegalStateException

- WakeupException

- InterruptExcetpion

remote:

- UnknownServerException

- InvalidTopicException

local:

- IllegalArgumentExcetpion

- IllegalStateException

- WakeupException

- InterruptExcetpion

remote:

UnknownServerException

- InvalidTopicExcetpion

local:

fatal

local:

- ConfigException

remote:

- AuthorizationException (including all subclasses)

- AuthenticationException (inlcuding all subclasses)

- SecurityDisabledException

- InvalidTopicException

 

local:

- ConfigException

remote:

- AuthorizationException (including all subclasses)

- AuthenticationException (inlcuding all subclasses)

- SecurityDisabledExcetpion

- InvalidTopicException

- UnkownTopicOrPartitionsException (retyable? refresh metadata?)

- RecordBatchTooLargeException

- RecordTooLargeException

 

local:

- ConfigException

remote:

- AuthorizationException (including all subclasses)

- AuthenticationException (inlcuding all subclasses)

- SecurityDisabledExcetpion

 

local:

- ConfigException

remote:

- AuthorizationException (including all subclasses)

- AuthenticationException (inlcuding all subclasses)

- SecurityDisabledExcetpion

local:

- ConfigException

- SerializationException

retriable

local:

 

remote:

- InvalidOffsetException (OffsetOutOfRangeException, NoOffsetForPartitionsException)

- CommitFailedException

- TimeoutException

- QuotaViolationException?

local:


remote:

- CorruptedRecordException

- NotEnoughReplicasAfterAppendException

- OffsetOutOfRangeException (when can producer get this?)

- TimeoutException TimeoutException

- QuotaViolationException?

- BufferExhausedException (verify)

local:

 

remote:

local:

 

remote:

local:

recoverable

local:

 

remote:

 

local:

 

remote:

- ProducerFencedException

local:

 

remote:

local:

 

remote:

local:

- LockException

 

Having a look at all KafkaException there are some exception we need to double check if they could bubble out any client (or maybe we should not care, an treat all of them as fatal/remote exceptions).

-> DataException, SchemaBuilderExcetpion, SchemaProjectorException, RequestTargetException, NotLeaderException, NotAssignedException, IllegalWorkerStateException, ConnectRestException, BadRequestException, AlreadyExistsException (might be possible to occur, or only TopicExistsException), NotFoundException, ApiException, InvalidTimestampException, InvalidGroupException, InvalidReplicationFactorException (might be possible, but inidcate bug), o.a.k.common.erros.InvalidOffsetExcetpion and o.a.k.common.errors.OffsetOutOfRangeException (side note: do those need cleanup – seems to be duplicates?), ReplicaNotAvailalbeException, UnknowServerException, OperationNotAttempedException, TransactionalCoordinatorFencedException, PolicyViolationException, ControllerMovedException, UnkownMemberIdException, InvalidConfigurationException, InvalidFetchSizeException, InvalidReplicaAssignmentException, OutOfOrderSequenceException, InconsistendGroupProtocolException, ReblanceInProgressException, LogDirNotFoundException, BrokerNotAvailableException, InvalidOffsetCommitSizeException, InvalidTxnTimeoutException, InvalidPartitionsException, TopicExistsException (cf. AlreadyExistException), InvalidTxnStateException, ProducerFencedException, UnsupportedForMessageFormatException, UnsupportedVersionException, InvalidSessionTimeoutException, InvalidRequestException, IllegalGenerationException, InvalidRequiredAckException,  

-> RetryableException, CoordinatorLoadInProgressException, GroupLoadInProgressException, CoordinatorNotAvailalbeException, NotControllerException, RetryableCommitException, DuplicateSequenceNumberException, NotEnoughReplicasException, NotEnoughReplicasAfterAppendException, InvalidRecordExcetpionInvalidRecordException, NotCoordinatorForGroupException, DisconnectException, InvalidMetaDataException (NotLeaderForPartitionException, NoAvailableBrokersException, NetworkException, UnkonwTopicOrPartitionException, KafkaStoreException, StaleMetadataException, LeaderNotAvailalbeException), NotCoordinatorException, GroupCoordinatorNotAvailableException

Should never happen:

 - UnsupportedVersionException (we do a startup check and should not start processing for this case in the first place)

Handled by client (consumer, producer, admin) internally and should never bubble out of a client: (verify)

 - ConnectionException, RebalanceNeededException, InvalidPidMappingException, ConcurrentTransactionException, NotLeaderException, TransactionalCoordinatorFencedException, ControllerMovedException, UnkownMemberIdException, OutOfOrderSequenceException, CoordinatorLoadInProgressException, 
Might bubble out:

 - 

 

 

What about

  • TopicExistException (consumer group leader "split brain" – might be self-healing)

 

 GroupLoadInProgressException, NotControllerException, NotCoordinatorException, NotCoordinatorForGroupException, StaleMetadataException, NetworkException, 

 

Gliffy Diagram
nameExceptionHandling

 

  Gliffy DiagramnameExceptionHandling