Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Learner task shares the same semantics as standby task, and the which is only taken care by the restore consumer. The only difference is that when the restoration of learner task is complete, the stream instance will initiate a new JoinGroupRequest to call out rebalance of the new task assignment. The goal of learner task is to delay the task migration when the destination host has not finished or even started replaying the active task. This applies to both scale up and scale down scenarios.

...

As mentioned in motivation section, we also want to mitigate the stop-the-world effect of current global rebalance protocol. A quick recap of current rebalance semantics on KStream: when rebalance starts, all members would

  1. Join group with all current assigned tasks revoked.

  2. Wait until group stabilized assignment finished to resume the work.

The reason for doing so is because we need to guarantee each topic partition is assigned with exactly one consumer at a any time. So one topic partition could not be re-assigned before it is revoked.

...

Scale Up Running Application

The newly joined members workers will be assigned with learner tasks by the group leader and they will replay the corresponding changelogs on local first. By the end of first round of rebalance, there is no “real task transfer”. When new member finally finishes the replay task, it will re-attempt to join the group to indicate that it is “ready” to take on real active tasks. During second rebalance, the leader will eventually transfer the task ownership.

...

scale.down.timeout.ms

Default: infinity

Timeout in milliseconds to force terminate the stream worker when informed to be scaled down.

A public interface is any change to the following:

...

Binary log format

...

to help user define their customized strategy.


Compatibility, Deprecation, and Migration Plan

  • Metadata size increase
  • No downtime upgrade due to change of protocolType

FAQ

Why do we call stream workers?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.