Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Update Java Docs

  • API signatures will remain the same

  • Exception thrown will be different.

Summary

This document highlights our effort to refactor the threading model of the KafkaConsumer, and the purpose of this refactor is to address issues and shortcomings we’ve encountered in the past years, such as increasing code complexity, lock and race conditions, and the coupling between the client poll and the rebalance progress.

...

  • Deprecate HeartbeatThread

    • Remove the heartbeat thread.
    • Move the heartbeat logic to the background thread.
  • Refactor the Coordinator classes

    • Revisit timers and loops
    • Remove some blocking methods, Blocking method such as commitOffsetSync .  The blocking commit should will be handled by the polling poling thread , by waiting for on the background thread to complete the futurefuture.  Coordinator should not be blocking.
    • The coordinator will be polled by the background thread loop.
    • Rebalance state modification: We will add a few states to ensure the rebalance callbacks are executed in the correct sequence and time.
  • Refactor the KafkaConsumer API

    • It will send Events to the background thread if network communication is required.
    • Remove dependency on the fetcher, coordinator, and networkClient.
  • Events and communication channels.

    • We will use two channels the facilitate the two-way communication
  • Address issues in these Jira tickets

...

A note on the subscriptionState: Its reference will be shared by polling and background threads, i.e., we will refactor it using a thread-safe data structure or provide explicit locking and synchronization barriers.

Important Components

Background thread and its lifecycle

...

We will first write and use the existing unit tests and integration tests.

We need to make sure ensure the timing of the 1.  coordinator discovery and 2.  joinGroup operations are being done in the correct timingcritical events are happening in the correct sequence.  For example: We need to first discover the coordinator, second, commit the offset while pausing the partition for being fetched, revoke the partition, and then continue onto rest of the rebalance process.

We will also ensure the heartbeat interval, and poll interval are respected.

...

The refactor should have (almost) no regression or breaking changes upon merge.  So user should be able to continue using the new client.

Release Plan

  • Support code will live a long side with the current code
    • Background thread
    • A new coordinator implementation, AsyncConsumerCoordinator for example.
    • Events and event executors
  • We will create a new KafkaConsumer class first, then have it override the existing one once reach stability