You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Status

Current state: "Under Discussion"

Discussion thread: TBD

JIRA: KAFKA-6048

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka does not support negative record timestamps, and this prevents the storage of historical data in Kafka. In general, negative timestamps are supported by UNIX system timestamps: 

From https://en.wikipedia.org/wiki/Unix_time

The Unix time number is zero at the Unix epoch, and increases by exactly 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after the epoch, is represented by the Unix time number 12,677 × 86,400 = 1095292800. This can be extended backwards from the epoch too, using negative numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is represented by the Unix time number −4,472 × 86,400 = −386380800.

Public Interfaces

  • org.apache.kafka.common.record
  • org.apache.kafka.clients.producer
  • org.apache.kafka.streams.processor

Proposed Changes

First, we need to remove all checks for negative timestamps across the code:

  • client should be able to publish record with a negative timestamp,
  • broker should accept and serve that record,
  • streams should not drop a record with a negative timestamp.

 

NO_TIMESTAMP (−1) problem

The broker uses −1 as a default value for missing timestamp. Which might be a correct value set by the user: Wednesday, December 31, 1969 11:59:59 PM UTC.

 Options we have are:

  1. Ignore that problem and:
    1. interpret this value as a real timestamp
    2. or still, interpret −1 as "no timestamp" and other values as a real timestamp (can we borrow 1 millisecond for our needs?).
  2. Add a topic property that says if it may have records with "no timestamp'. That case:
    1. users would need to create a new topic and migrate/stream all the records from old topic to the new one
    2. users decide what to do with a record without a timestamp: set the timestamp to current or some specific one based on a message content.
  3. Solve this with a new message flag:
    1. add a special boolean flag to message record "hasTimestamp",
    2. write a migration tool that adds this flag to message with the negative timestamp to legacy messages,
    3. make sure clients know about that field and check them
    That makes lookup by timestamp a wired thing: .timeindex will not know about these records.

Compatibility, Deprecation, and Migration Plan

What impact (if any) will there be on existing users?

TBD

If we are changing behavior how will we phase out the older behavior? 

TBD

If we need special migration tools, describe them here.

TBD

When will we remove the existing behavior?

TBD

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

TBD

  • No labels