Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Our primary method for testing the implementation will be through Discrete Event Simulation (DES). DES allows us to test a large number of deterministically generated random scenarios which include various kinds of faults (such as network partitions). It allows us to define system invariants programmatically which are then checked after each step in the simulation. The protocol will be formally verified with a TLA+ model as well. Other than that, we will use the typical suite of unit/integration/system tests. System tests will be parameterised to run with both protocols.

Rejected Alternatives

An epoch per partition

Our initial idea was to use an epoch per partition instead of using an epoch per member. This does not work due to the custom bytes alongside the assigned partitions. The partitions and the bytes must be treated as a unit.

Use a two group coordinators

We considered using a new dedicated group coordinator to host and manage the new consumer group. We rejected this because this would make the migration from the old protocol to the new protocol much more difficult.

No more client-side assignors, even for Kafka Streams

We considered removing entirely the client-side assignor feature. From a consumer perspective, this is rarely used nowadays. Kafka Streams is the primary user of this feature. For Kafka Streams, we considered using a server-side assignor as well. In the end, we felt like introducing a dependency between Kafka Streams and the broker was not a good idea for the evolution of Kafka Streams because it would prevent users from using new features of Kafka Streams without upgrading their brokers first.TODO

Future Work

Eventually, we aim at deprecating the current membership/rebalance API. In order to get to this point, we would need to first move all the use cases away from it.

...