Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

However, recently, we've encountered some rebalance stuck issues, and the root cause of them are due to the out-of-date "ownedPartition". Because there are chances that the "ownedPartitions" are out-of-date, the assignors will blindly "trust" the "ownedPartitions", and do assignment depend on them, and cause unexpected results. ex: KAFKA-12984, KAFKA-13406. Currently, we tried to workaround this issue by adding "generation" field into subscription "userData" field in cooperative sticky assignor, and deserialize them when doing assignment, to identify if the "ownedPartitions" are out-of-date or not. However, this workaround only works for cooperative sticky assignor, if users have their own custom cooperative assignor, they also need to workaround it manually. Otherwise, the same issues also happen to them.


Only appending "ownedPartitions" data without "generation" info in the Subscription message, is like in TCP, only send packets without appending the sequence number. It'll confuse the assignor(or TCP receivers) and make the wrong decision.

Therefore, we should add the "generation" field into "Subscription" data of the consumer protocol. (i.e. ConsumerProtocolSubscription)

...

Updated consumer protocol for Subscription, which will add a new field "generation" at the end, and bump the version to V2.


KafkaConsumer:
 
Subscription => TopicList UserData AssignedPartitions Generation
   TopicList               => List<String>
   UserData                => Bytes  
   OwnedPartitions         => List<String, List<Int32>>
   Generation              => Int32   <--- new field


...