Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

The first thing to know about using a High Level Consumer is that it can (and should!) be a multi-threaded application. The threading model revolves around the number of partitions in your topic and there are some very specific rules:

  • if you provide more threads than there are partitions on the topic, some threads will never see a message
  • if you have more partitions than you have threads, some threads will receive data from multiple partitions
  • if you have multiple partitions per thread there is NO guarantee about the order you receive messages, other than that within the partition the offsets will be sequential. For example, you may receive 5 messages from partition 10 and 6 from partition 11, then 5 more from partition 10 followed by 5 more from partition 10 even if partition 11 has data available.
  • adding more processes/threads will cause Kafka to re-balance, possibly changing the assignment of a Partition to a Thread.

Next, your logic should expect to get an iterator from Kafka that may block if there are no new messages available.

...

The example code expects the following command line parameters:

  • ZooKeeper connection string with port number
  • Consumer Group name to use for this process
  • Topic to consume messages from
  • # of threads to launch to consume the messages

For example:

Code Block
server01.myco.com1:2181 group3 myTopic  4

...