Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Update Java Docs

  • API signatures will remain the same

Summary_mcomplexfixes_

This document highlights our effort to refactor the threading model of the KafkaConsumer, and the purpose of this refactor is to address issues and shortcomings we've encountered in the past years, such as increasing code complexity, lock and race conditions, and the coupling between the client poll and the rebalance progress.

  • Code complexity: Patches and hotfixes in the past years have heavily impacted the readability of the code. Firstly, it mcomplexfixes increasingly difficult, and secondly, it makes the code hard to   The complex code path and intertwined logic make the code difficult to modify and comprehend. For example, coordinator communication can happen in both the heartbeat thread and the polling thread, which makes it challenging to diagnose the source of the issue when there's a race condition, for example. Also, many parts of the coordinator code have been refactored to execute asynchronously; therefore, we can take the opportunity to refactor the existing loops and timers. 

  • Complex Coordinator Communication: Coordinator communication happens at both threads in the current design, and it has, caused some race conditions such as KAFKA-13563. We want to take the opportunity to move all coordinator communication to the background thread and adopt a more linear design.

  • Asynchronous Background Thread: One of our goals here is to make rebalance process happen asynchronously to the poll. Firstly, it simplifies the pd design, as it essentially only needs to handle fetch requests. Secondly, because rebalance occurs in the background thread, the polling thread won't be blocked or blocking the rebalance process.

...

  1. The background thread has not been constructed, so it is in the down state.

  2. The polling thread starts the background; the background thread finishes initialization and then moves to the initialized state. Usually, this happens after new KafkaConsumer().

  3. The background thread loop performs the following thingstasks:

    1. Check if there is an in-flight event. If not, poll for the new events from the channel.

    2. Run the state machine, and here are the following scenario:

      • The event does not require a coordinator.  Execute the event and move on.
      • The event requires a coordinator, but it is currently disconnected.  Move the state to coordinator_discovery.
      • The background thread is currently in coorinator_discovery state; continue to loop and wait for the FindCoordinator response.
      • The background thread is in the stable state.
        • Poll the Coordinator.
        • Execute the event.
  4. Poll the networkClient.

  5. Backoff for retryBackoffMs milliseconds.

...

Is there a better way to configure session interval and heartbeat interval?

Consumer API Internal Changes

Assign


Event Data Models

ConsumerEvent
NameTypeDescription
typeConsumerEventTypeType of event
requiredCoordinatorboolIndicate if the event uses coordinator
ServerEvent
typeServerEventTypeType of event

...