Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed the incorrect naming for existing config

...

In Kafka, the introduction of the timestamp field in the message format through KIP-32 brought along two additional configurations: log.message.timestamp.type and maxlog.message.timetimestamp.difference.max.ms.

By default, the configuration values are log.message.timestamp.type=CreateTime and maxlog.message.timetimestamp.difference.max.ms=9223372036854775807 (java.lang.Long.MAX_VALUE). This allows Producers to send messages with timestamps as far back as the minimum representable timestamp and ahead in the future at a scale of 100s of years. (Side Note: We could potentially change the default value of maxlog.message.timetimestamp.difference.max.ms to  to something more sensible but that is not the motivation for this KIP.)

While there can be valid use cases for messages with past timestamps for the purpose of replaying old messages, messages with future timestamps are inherently inaccurate and can lead to unexpected log rotation behavior. Kafka users have encountered problems due to misconfigured producers, such as using nanoseconds instead of milliseconds for the message timestamp. This anecdotal evidence from a Medium article highlights the challenges associated with this issue.

The motivation behind this proposal is to improve the validation logic for message timestamps by rejecting messages with future timestamps and providing a descriptive exception. This will help improve data integrity and prevent potential pitfalls caused by inaccurate timestamp handling.

...

In the org.apache.kafka.storage.internals.log.LogValidator, we will introduce a new validation that compares the message timestamp to the broker's timestamp and rejects the message if it is ahead of the broker's timestamp. This validation will be applicable when log.message.timestamp.type is set to CreateTime. There will be no other change to the exiting validation logic.

Considering potential clock drift issues that the broker may encounter with the time synchronization service of the underlying operating system, we propose the introduction of a constant, TIME_DRIFT_TOLERANCE, with a value of one hour. Time synchronization services like NTP and PTP are capable of fixing drift in the order of milliseconds. Therefore, assuming a one-hour threshold should be sufficient to accommodate all clock drift cases.

...

The validation occurs before appending a message to the active segment of the local log. We iterate through each batch included in the write request, and the validation logic follows this sequence for each record in the batch:

  1. If log.message.timestamp.type=LogAppendTime, the server overwrites the timestamp with the broker's timestamp and proceeds without further validation related to the record timestamp. Note that variations exist when the message is compressed or not compressed, but the existing logic remains unchanged. Regardless of compression, the timestamp is always overwritten with the broker's current time.
  2. We check if the timestamp included with the record is in the future (a new validation).
  3. If the timestamp is in the future, we create a new record level error, ApiRecordError,  with Error code 32 (INVALID_TIMESTAMP) and an error message indicating that the record timestamp is in the future compared to the broker's time.
  4. If the record timestamp is not in the future, we proceed to validate if the record timestamp falls within the allowable time difference configured in maxlog.message.timetimestamp.difference.max.ms. The allowable timestamp difference is calculated as the absolute value of the record timestamp minus the broker's timestamp.
  5. If the allowable time difference falls outside the acceptable range as configured in maxlog.message.timetimestamp.difference.max.ms, we create a record level error ApiRecordError with Error code 32 (INVALID_TIMESTAMP) and an error message indicating that the record timestamp is outside the acceptable timestamp difference range.

...