Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: changed description of impact of running multiple processes for same Consumer Group

...

First thing to know is that the High Level Consumer stores the last offset read from a specific partition in ZooKeeper. This offset is stored based on the name provided to Kafka when the process starts. This name is referred to as the Consumer Group. This

The Consumer Group name is global across a Kafka cluster, so you should be very careful to only run one process with that name.Since the offsets are being stored by Consumer Group, running a second process with the same name will start reading from the value stored in ZooKeeper and overwrite the values. So lots of odd things happen when you do this, including possible duplicate reads of messagesthat any 'old' logic Consumers be shutdown before starting new code. When a new process is started with the same Consumer Group name, Kafka will add that processes' threads to the set of threads available to consume the Topic and trigger a 're-balance'. During this re-balance Kafka will assign available partitions to available threads, possibly moving a partition to another process. If you have a mixture of old and new business logic, it is possible that some messages go to the old logic.

Designing a High Level Consumer

...

  • if you provide more threads than there are partitions on the topic, some threads will never see a message
  • if you have more partitions than you have threads, some threads will receive data from multiple partitions
  • if you have multiple partitions per thread there is NO guarantee about the order you receive messages, other than that within the partition the offsets will be sequential. For example, you may receive 5 messages from partition 10 and 6 from partition 11, then 5 more from partition 10 followed by 5 more from partition 10 even if partition 11 has data available.
  • adding more processes/threads will cause Kafka to re-balance, possibly changing the assignment of a Partition to a Thread.

Next, your logic should expect to get an iterator from Kafka that may block if there are no new messages available.

...