Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, in Kafka, when a consumer falls out of a consumer group, it will restart processing from the last checkpointed offset. However, this design could result in a lag which some users could not afford to let happen. For example, lets say a consumer crashed at offset 100, with the last checkpointed offset being at 70. When it recovers at a later offset (say, 120), it will be behind by an offset range of 50 (120 - 70). This is because the consumer restarted at 70, forcing it to reprocess old data.

Public Interfaces

This new mode (name is enable.parallel.rebalance) will by default not be used by KafkaConsumer (e.g. will be by default equal to false). If the user chooses, he or she can activate this mode by setting the config's value to true.

...

In this scenario, if auto commit is enabled, we would commit offsets from buffer2 directly into the original commit log.  Once committed, those offset will be discarded. Meanwhile, the offsets found in buffer1 will be checkpointed in another Kafka topic. When calling KafkaConsumer#committed, if consumer2 has not terminated, then only offsets checkpointed by consumer2 will be sent. Otherwise, the offsets checkpointed by consumer1 will be returned instead.

...

When the user polls() for new offsets, if consumer2 has not finished, then only offsets from buffer2 will be sent to the user. After each poll(), buffer2 will be emptied of its contents after its records have been returned to the user to free up memory. (Note that if some offsets in buffer2 has not been committed yet, then those offsets will still be kept.) However, if consumer2 has terminated, then all offsets that has accumulated in buffer1 will also be sent (with both buffer2 and buffer1 being discarded, now that both are no longer used). In this manner, we could also guarantee ordering for the user.

...