...
Code Block | ||
---|---|---|
| ||
Algorithm incremental-rebalancing Input Set of Tasks, Set of Instances, Set of Workers, Where each worker contains: Set of active Tasks, Set of standby Tasks, owned by which instance Main Function Assign active tasks: (if any) To instances with learner tasks that indicates "ready" To previous owners To unready learner tasks owners To instances with standby tasks To resource available instances Keep existing learner task assignment Pick new learner tasks out of heaviest loaded instances Assign learner tasks: (if any) To previous owners (no half way bounce at least in the first version) To new-coming instances with abundant resource To instances with corresponding standby tasks Assign standby tasks: (if any) To instances without matching active tasks To previous active task owners after learner transfer in this round To resource available instances Based on num.standby.task config, this could take multiple rounds Output Finalized Task Assignment |
Also for the smooth delivery of all the features we have discussed so far, an iteration plan of this reaching final algorithm is defined as below:
Version 1.0
Delivery goal: Scale up support, conservative rebalance
The goal of first version is to realize the foundation of learner algorithm for scaling up scenario. The leader worker will use previous round assignment to figure out which instances are new ones, and the learner tasks shall only be assigned to them once. The reason we are hoping to only implement new instances is because there is a potential edge case that could break the existing naive learner assignment: when the number of tasks are much smaller than total cluster capacity, we could fall in endless resource shuffling. Also we care more about smooth transition over resource balance for stage one. We do have some historical discussion on marking weight for different types of tasks. If we go ahead to aim for task balance too early, we are potentially in the position of over-optimization. In conclusion, we want to delay the finalized design for eventual balance until last version.
We also don't want to take the eager rebalance optimization in version 1.0 due to the explained concerns.
Version 2.0
Delivery goal: Scale down support
...
Version 4.0 (Stretch)
Delivery goal: task labeling, eventual workload balance
...
We also want to balance the load separately for stateful tasks and stateless tasks .
As we could see, there should be only exactly one learner task after each round of rebalance, and there should be exactly one corresponding active task at the same time.
as discussed above. So far version 4.0 still has many unknowns and is slightly beyond the incremental rebalancing scope. A separate KIP may be initiated to discuss this approach in detail.
Algorithm Trade-offs
We open a special section to discuss the trade-offs of the new algorithm, because it's important to understand the change motivation and make the proposal more robust.
...
- Set the `stream.rebalancing.mode` to `upgrading`, where we which will set force the stream application to stay with protocol type remaining to "consumer".
- Rolling restart the stream application , and the change should be is automatically applied. This is safe because we are not changing protocol type anyway.
In long term we are proposing a more smooth and elegant upgrade approach than the current one. However it requires broker upgrade which may not be trivial effort for the end user.
FAQ
Why do we call stream workers?. So far this workaround could make the effort smaller.
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other wayN/A for the algorithm part. For implementation plan trade-off, please review the google doc.