Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]Draft

Discussion thread: TBD

JIRA: TBD  here [Change the link from the KIP proposal email archive to your own email thread]
JIRA: here [Change the link from KAFKA-1 to your own ticket]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Pulsar has officially supported key share feature in 2.4, which suggests multiple consumers could share the same partition data.

The blocker blockers for us to implement a similar feature are 3-folds:

  1. Our consumer model is pull based, which incurs random read if we let consumers ask for specific keyed records. Sequential read is the key performance sugar for Kafka, as otherwise we could not bypass memory copy of the data. (Intermediate service)
  2. Our broker doesn’t know anything about data distribution, as all the metadata is encoded and at least it has to understand the message keylooks opaque to them. In reality, we could not afford letting consumers fetch with raw keys.
  3. Consumer coordinator is at a centralized location, however we need to distribute keys in different partitions. For Pulsar their offset data is co-located with actual partitions. The burden for broadcasting is highthe state change in Kafka would be pretty hard and very error-prone.

Compatibility, Deprecation, and Migration Plan

...

  1. KIP-253 proposed physical partition expansion, which is a fairly complex implementation and could be hard to reason about correctness.
  2. Some discussion around making Kafka Streams as a multi-threading model where consumers are completely decoupled from processing thread. This means we have to tcommitle conquer the concurrent processing challenge and there could be more inherent work to redesign state store semantics too.

...