...
Leader crash could cause a missing of historical assignment information. For the learner assignment, however, each worker maintains its own assignment status, so when the learner task's id has no active task running, the transfer will happen immediately. Leader switch in this case is not a big concern. The essence is that we don't rely on leader information to do
the assignment.
Code Block | ||||
---|---|---|---|---|
| ||||
Cluster has 3 stream workers S1(leader), S2, S3, and they own tasks 1 ~ 5 Group stable state: S1[T1, T2], S2[T3, T4], S3[T5] #First Rebalance New member S4 joins the group S1 performs task assignments: S1(assigned: [T1, T2], revoked: [], learning: []) S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5], revoked: [], learning: []) S4(assigned: [], revoked: [], learning: [T1]) #Second Rebalance S1 crashes/gets killed before S4 is ready, S2 takes over the leader. Member S2~S4 join with following status: S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5], revoked: [], learning: []) S4(assigned: [], revoked: [], learning: [T1]) Note that T2 is unassigned, and S4 is learning T1 which has no current active task. We could rebalance T1, T2 immediately. S2 performs task assignments: S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5, T2], revoked: [], learning: []) S4(assigned: [T1], revoked: [], learning: []) Now the group reaches balance. |
...
The above examples are focusing more on demonstrating expected behaviors with KStream incremental rebalancing. We also want to define the new learner algorithm for a holistic view.
We have set of workers, where each worker contains:
Workers = []
Set<ActiveTask>
Set<StandbyTask>
As we could see, there should be only exactly one learner task after each round of rebalance, and there should be exactly one corresponding active task at the same time.
...