Status
Current state: Under Discussion
Discussion thread: https://lists.apache.org/thread/cz9x90883on98k082qd0tskj6yjhox1t
JIRA:
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
KIP-98 - Exactly Once Delivery and Transactional Messaging introduced EOS to kafka through transactions and idempotent producers. Idempotent producers offer guarantees like deduplication and some exactly once semantics on a single partition in a single producer session. They require only a producer ID which is assigned when the producer first starts up. Transactions offer guarantees across topic partition and producer sessions and require both a producer ID given by the system and a transactional id provided by the user.
Since transaction IDs are provided by the user and expected to persist across client restarts, we also expect the number of transactional IDs to be proportional to the number of clients. Producer IDs, when not associated to a transaction, will be created every time the producer restarts. (If they are associated to a transactional ID, the producer ID will also be reused.)
We currently store mappings from transaction ID to producer ID (along with other transactional state), and from producer ID to producer state. Both of these maps are configured to retain their values unless there are no updates to the entry for transactional.id.expiration.ms
. Note that both maps use this configuration for expiration. The producer ID state is stored in each partition, so we could expect to see the (number of clients * the number of partitions * the number of producer restarts) number of producer IDs for idempotent producers. This is potentially a much larger number compared to the storage transactional IDs that is roughly proportional to number of clients.
With KIP-679: Producer will enable the strongest delivery guarantee by default idempotent producers became the default in Kafka. This means that unless otherwise specified, all new producers will be given producer IDs.
Some (inefficient) applications may now create many non-transactional idempotent producers. Each of these producers will be assigned a producer ID and these IDs are stored in the aforementioned producer ID to state map in ProducerStateManager. Since there may be a new influx of producer ID usage that will surpass transactional ID usage, it may be useful to have a separate timeout for producer IDs to avoid excess memory usage in this map.
Public Interfaces
We propose adding the following configuration
Name | Description | Default | Valid Values |
---|---|---|---|
| The time in ms that the ProducerStateManager will wait before expiring producer ids. Producer IDs will not expire while a transaction associated to them is still ongoing. Note that producer ids may expire sooner if the last write from the producer id is deleted due to the topic's retention settings. | -1 | [-1, …] |
The following configuration will also be modified:
Name | OLD Description | New Description |
---|---|---|
| The time in ms that the transaction coordinator will wait without receiving any transaction status updates for the current transaction before expiring its transactional id. This setting also influences producer id expiration - producer ids are expired once this time has elapsed after the last write with the given producer id. Note that producer ids may expire sooner if the last write from the producer id is deleted due to the topic's retention settings. | The time in ms that the transaction coordinator will wait without receiving any transaction status updates for the current transaction before expiring its transactional id. Transactional IDs will not expire while a the transaction is still ongoing. |
Proposed Changes
Replace transactional.id.expiration.ms
with producer.id.expiration.ms
in ProducerStateManager when checking whether producers ids should be expired. Also update the documentation for transactional.id.expiration.ms
Keep in mind that the expiration also checks if a transaction is in progress for a given producer ID. Thus, this change mainly impacts idempotent producers. A transactional producer won’t have its ID expire during a transaction. When the transaction completes, the producer ID may be removed from the ProducerStateManager map, but will be put in the map again on the start of a new transaction.
Currently, when a idempotent producer’s ID expires, it silently loses its idempotency guarantees.
NOTE: Setting producer.id.expiration.ms
to be lower than the delivery timeout will increase the likelihood of duplicates. Any expiration of producer IDs means that the producer can no longer take advantage of idempotency guarantees.
Compatibility, Deprecation, and Migration Plan
We can set the default to be -1. This is will be used a flag to signal using the same value as transactional.id.timeout.ms
. This will offer backwards compatibility and and continuity of behavior for the use. It will also cover the case where transaction.id.timeout.ms
was also using its default.
Rejected Alternatives
Have a default value of 7 days
This would be a safe option and match a configuration that did not customize transactional.id.expiration.ms
. In the case of a custom configuration for transactional expiration, it would more easily reveal that the new configuration existed. However, since this is still not totally apparent, it is better to default to not changing current behavior.
Have a default value less than 7 days
The motivation for this change is to have a different (and shorter) period for expiration so that the broker is not easily overwhelmed with all the new idempotent producers. Since this configuration will silently drop idempotency guarantees, it seems safer to make the default more conservative.
Have the flag value as 0 instead of -1
0 would allow us to not have to worry about a valid configuration of 0 ms (immediately expired). However, 0 seems less intuitive as a flag and 0 ms is not all that different in practice to 1, which is a valid value for transactional.id.expiration.ms
.