Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is the timeout when we count down to trigger exactly one rebalance (i.e, the time estimate to spin up # of hosts) since the first joined member's request. It is advised to be set roughly the same with session timeout to make sure the workload become balanced when you 2X or 3X your stream job. Example with expansion timeout 5 min: 

 

EventTimecount timeAction
New member A join00:00 00:00 N/A

New member B join

00:03 00:00N/A
Expansion timeout00:05N/ARebalance
New member C join00:0600:06N/A
Expansion timeout00:11N/ARebalance

...

To make sure we could recover from broker failure/leader transition, an in-memory member name map is not enough. We will init another topic called `static_member_map` in the cluster and each time we have a stable state we should write the complete mapping information as a single message into it. This way when another broker takes over the leadership, we could transfer the mapping together.

Switch between static and dynamic membership

...

In this pull request, we did an experimental approach to materialize member id on the instance local disk. This approach could reduce the rebalances as expected, which is the experimental foundation of KIP-345. However, KIP-345 has a few advantages over it:

...