Reduce unnecessary downtime due to task restoration
Make rebalance performance better for stream applications, A.K.A alleviating Stop-The-World Effect.

Proposed Changes

New

...

Terminology

we shall define some terms for easy walkthrough of the algorithm.

Worker (stream worker): unit of streaming processor on thread level. We want to separate the name from "stream consumer "
Instance (stream instance):
Leaner task: a special task that gets assigned to one stream instance to restore a current active task state from another instance.

Learner Task Intro

Learner task It shares the same semantics as standby task, and the only difference is that when the restoration of learner task is complete, the stream instance will initiate a new JoinGroupRequest to call out rebalance of the new task assignment. The goal of learner task is to delay the task migration when the destination host has not finished or even started replaying the active task. This applies to both scale up and scale down scenarios.

...

A stream instance S1 takes two learner tasks T1, T2, where restoring time time(T1) < time(T2). Under eager rebalance approach, the instance will call out rebalance immediately when T1 finishes replaying. While under stable rebalance, instance will rejoin the group until it finishes replaying of both T1 and T2.Boolean config: learner.partial.ready

Algorithm Trade-offs

We open a special section to discuss the trade-offs of the new algorithm, because it's important to understand the change motivation and make the proposal more robust.

More rebalances

The new algorithm will invoke many more rebalances than the current protocol as one could perceive. As we have discussed in the overall incremental rebalancing design, it is not always bad to have multiple rebalances when we do it wisely, and after KIP-345 we have a future proposal to avoid scale up rebalances for static members. The goal is to pre-register the members that are planning to be added. The broker coordinator will augment the member list and wait for all the new members to join the group before rebalancing, since by default stream application’s rebalance timeout is infinity. The conclusion is that: it is server’s responsibility to avoid excessive rebalance, and client’s responsibility to make each rebalance more efficient.

Metadata size increase

Since we are carrying over more information during rebalance, we should be alerted on the metadata size increase. So far the hard limit is 1MB per metadata response, which means if we carry over too much information, the new protocol could hit hard failure. This is a common pain point for finding better encoding scheme for metadata if we are promoting incremental rebalancing KIPs like 415 and 429. Some thoughts from Guozhang has been started in this JIRA and we will be planning to have a separate KIP discussing different encoding technologies and see which one could work.

Public Interfaces

We will be adding following new configs:

...

Space shortcuts

Child pages

Versions Compared

Old Version 13

New Version 14

Key

Proposed Changes

New

Terminology

Learner Task Intro

Algorithm Trade-offs

More rebalances

Metadata size increase

Public Interfaces

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 13

New Version 14

Key

Proposed Changes

New

Terminology

Learner Task Intro

Algorithm Trade-offs

More rebalances

Metadata size increase

Public Interfaces