Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ExceptionThrown ScenariosThrown placesError TypeSuggestion
IllegalStateExceptionChecked for various call paths checking impossible situations, indicating a bug.1), Wrapped as KafkaException(e)FatalWe should always throw IllegalStateException directly without wrapping
AuthenticationExceptionOnly for txnal requests whose txn.id are not authenticated1), Wrapped as KE(e)FatalNone, all good
InvalidPidMappingExceptionOnly for txnal request with the encoded txnID are not recognized or its corresponding PID is incorrect1), Wrapped as KE(e)

After KIP-360 (2.5+), Abortable as we bump epoch;

otherwise Fatal

None
UnknownProducerIdException

Similar to InvalidPidMappingException but only for produce request (i.e. the PID is not recognized on the partition leader).

NOTE this is removed as part of KIP-360, and hence would only be returned by old brokers. We keep this error code for now since we may re-use it in the future.

1), Wrapped as KE(e)

After KIP-360 (2.5+), Abortable as we bump epoch;

otherwise Fatal

None
TransactionAbortedExceptionOnly for produce request batches, when the txn is already aborting we would simply abort all unsent batches with this error2)N/A since it is not an error caseNone

ClusterAuthorizationException

TransactionalIdAuthorizationException

UnsupportedVersionException

UnsupportedForMessageFormatException


Most of these errors are returned from produce responses (txnal response could also return UnsupportedVersionException).

When these errors return, we would immediately mark the txnManager to error state as well. These are examples where the exceptions could be thrown in both txnManager#maybeFailWithError as well as from send callback/future.

1), Wrapped as KE(e); and

2)

FatalSee below

InvalidRecordException

InvalidRequiredAcksException

NotEnoughReplicasAfterAppendException

NotEnoughReplicasException

RecordBatchTooLargeException

InvalidTopicException

CorruptRecordException

UnknownTopicOrPartitionException

NotLeaderOrFollowerException

TimeoutException

These are all errors returned from produce responses, that are non-fatal (timeout exception on expired batch).1), Wrapped as KE(e); and 2)AbortableSee below

TopicAuthorizationException

GroupAuthorizationException

TopicAuthorizationException could be thrown via addPartition.

GroupAuthorizationException could be thrown via sendOffsetToTxn / findCoordinator.

Today they are all categorized as abortable but I think this should be fatal.


AbortableShould be fatal.

FencedInstanceIdException

CommitFailedException

Thrown from TxnOffsetCommit (CommitFailedException are translated from UNKNOWN_MEMBER and ILLEGAL_GENERATION).

Today it's treated as abortable. BUT I think it should really be fatal since it's basically indicating a fenced situation.

1) Wrapped as KE(e)AbortableShould be fatal.

InvalidProducerEpochException


This error used to be returned from both txnal response and produce response, but as of KIP-588 (2.7+), we would not let txn coordinator to return InvalidProducerEpochException anymore, but only from partitions leaders on produce responses, also we treat this as . Also since only older versioned coordinators still return InvalidProducerEpochException, clients would treat them as fatal ProducerFencedException at the client side, since only old versioned brokers would not return InvalidProducerEpochException now which should still be treated as fatal.

HOWEVER, for TxnOffsetCommit (sent to the group coordinator) we did not do this conversion which is a bug — we should always convert to ProducerFenced.

1), BUT not wrapped

Fatal if from txnal response (translated to ProducerFenced);

Abortable if from produce response.

It's unclear why we wrap all other exceptions but leave these two un-wrapped; we should have a consistent wrapping mechanism.

Plus, we should fix the bug for TxnOffsetCommit error handling.

ProducerFencedException

This error used to be returned from both txnal response and produce response, but as of KIP-588 it should only be from txnal responses. It is a typical fatal error indicating that another producer with the same PID and newer epoch is in use.

With KIP-447, producers from Kafka Streams should not be fenced by txn.id any more since we would fence them based on the GroupCoordinator instead; the actual case this would be thrown is usually when a txn is timed out (pending KIP-588 to be completed)

1), BUT not wrapped FatalIt's unclear why we wrap all other exceptions but leave these two un-wrapped; we should have a consistent wrapping mechanism.
OutOfOrderSequenceExceptionFrom produce response only, when the sequence does not match expected value1), Wrapped as KE(e)Abortable (for idempotent producer we would handle it internally by bumping epoch)See below

InvalidTxnStateException

From txnal response, only, indicating the producer is issuing a request that it should not be.

NOTE that we are handling this exception inconsistently: in endTxn it's wrapped as KE(e), in addPartitions it's wrapped as KE(KE(e))

1), Wrapped as either KE(e) or KE(KE(e))...FatalShould fix the inconsistent wrapping.
KafkaException

We definitely overloaded this one for various unrelated cases (which I think should be fixed):

1. when we failed to resolve those sequence-unresolved batches

1), Wrapped as KE(KE)


After KIP-360 (2.5+), Abortable as we bump epoch;

otherwise Fatal

Nested wrapping KafkaException(KafkaException(KafkaException..))) should be avoided.

For this case I suggest we wrap as KE(OutOfSequenceException).


2. when we are closing the producer, and hence need to garbage collect all pending txnal requests, we simply transit 

1), Wrapped as KE(KE)

FatalI don't think we should transit to error state at all for this case, and also shouldn't throw this exception either.

3. When a txnal response does not contain the "response()" field.1), Wrapped as KE(KE)FatalAgain, this should be an IllegalStateException since this should never happen.

4. All unexpected errors from txnal response1), Wrapped as KE(KE)FatalAgain, should not wrap it twice as KafkaException(KafkaException()).

5. When addPartition response returns with partition-level errors1), Wrapped as KE(KE(e))AbortableAgain should not wrap it twice as as KafkaException(KafkaException()).

RuntimeException

For any txnal requests, when request / response correlation id does not match1), Wrapped as KE(e)FatalI think we should throw CorrelationIdMismatchException instead, which inherits from IllegalStateException, hence should not be wrapped either.

...