Status
Current state: Under Discussion
Discussion thread:
JIRA:
Motivation
People use Kafka, among many reasons, because they need to be sure their messages are correctly processed by their applications. Classic configuration is to have 3 replica, and commit the offset of a message once it has been correctly processed. Developers use this configuraton because it is important not to lose any messages.
Nevertheless, there are some situations where messages are lost silently:
- Message expires before being consumed due to topic retention time.
- Message expires before being consumed due to topic size limit.
I propose to build a mechanism to log a warning when a message is going to/has been removed due to topic time/size retention settings, for a set of consumer groups especified on the topic configuration.
The kafka brokers know the information needed to achieve the goal:
- offset of the message that will be removed.
- last offset consumed from a consumer group.
Public Interfaces
The kafka-topic.sh tool must understand a new property on --config property:
- notify.groups.on.expiration : comma separated list of groups that will be notified on offset expiration.
Proposed Changes
The modifications introduced are in blue on the following list:
- The scheduler is triggered
- The scheduler will search for the logs to be deleted.
- Read the last offset consumed by all groups specified on notify.groups.on.expiration.
- The scheduler will remove the log.
- If the offset that has been removed is lower that the last consumed offset for each group, log a line:
- "message with offset %d partition %d topic %s has been removed without being consumed by group %s"
Compatibility, Deprecation, and Migration Plan
There is no impact on existing features.