Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Today, we don't have a downgrade story for transaction and group coordinators, and it becomes extremely difficult to downgrade once we add new fields to any of the existing record types used in their respective topics (__transaction_state and __consumer_offsets). Adding new record types or new fields is not backward compatible because older coordinators do not expect any change in the schema.

  • New Adding new record types: Future changes to both __consumer_offsets and __transaction_state topics may introduce new record types that are unknown to the existing coordinators. Today, the transaction coordinator fails as when it tries to deserialize unknown record keys into TransactionLogKey whereas and the group coordinator throws an IllegalStateException when the record key is unknown.
  • New Adding new fields to existing record types: We may introduce new fields to existing Value records in both topics (TransactionLogValue, GroupMetadataValue, and OffsetCommitValue) and bump their versions. The existing implementation of both coordinators throw an exception when the record type is known but the version is not supported.

...

We propose to require new fields added to existing Value records to always be tagged fields. We can modify SchemaGenerator to enforce this. Ideally, we should never bump the prevent version bumps for these records and there shouldn't be a need to as we can expect new fields to only be tagged fields but this may be difficult to enforce. We propose to deserialize unknown versions to the highest supported version as a safeguard and add a comment to advise against bumping the version.

...

  • If a transactional offset commit is in progress, we need to abort it before reformatting but we don't have a mechanism in place to trigger a server side abort. Furthermore, we will need to add logic so that the coordinator is notified when a transaction is aborted to proceed with the rewrite.
  • Producers perspective: we would either have to make the rewrite completely invisible to the producer or have the producer retry after aborting it from the server side. Both paths are complex and require additional investigation.
  • Definition of a rewrite: should we consider translating the transaction start time / deadline when rewriting?

We also need a separate logic to downgrade the __transaction_state Value record, TransactionLogValue, but it should be simpler. 

The benefit of this approach is that future record types are deleted. The proposed approach to ignore new records only works because the coordinator deletes new record types when a group is converted from new to old. However, we may introduce new record types that are not deleted during this conversion. Another benefit is that there are no strict requirements for Value records. We don't have to only add taggedFields (which this KIP requires) since these records will be rewritten anyways. Having the upgraded coordinator explicitly rewrite new record types and downgrade is future proof and there are no version downgrade barriers like we do for the proposed design.