THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Next we are going to look at several typical scaling scenarios and edge scenarios to better understand the design of this algorithm.
...
Normal Scenarios
...
Scale Up Running Application
...
- Increase the capacity of the current stream job to 2
- Mark existing stream instances as leaving
- Learner tasks finished on new hosts, shutting down old ones.
...
Code Block | ||||
---|---|---|---|---|
| ||||
Group stable state: S1[T1, T2], S2[T3, T4] Swapping application instances, adding S3, S4 with new instance type. #First Rebalance Member S3, S4 join the group S1 performs task assignments: S1(assigned: [T1, T2], revoked: [], learning: []) S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [], revoked: [], learning: [T2]) S4(assigned: [], revoked: [], learning: [T4]) Use scaling tool to indicate S1 & S2 are leaving #Second Rebalance Member S1, S2 initiates rebalance to indicate state change (to leaving) Member S1~S4 join with following status: S1(assigned: [T1], revoked: [T2], learning: []) S2(assigned: [T3], revoked: [T4], learning: []) S3(assigned: [], revoked: [], learning: [T2]) S4(assigned: [], revoked: [], learning: [T4]) S1 performs task assignments: S1(assigned: [T1, T2], revoked: [], learning: []) S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [], revoked: [], learning: [T1, T2]) S4(assigned: [], revoked: [], learning: [T3, T4]) #Third Rebalance S3 and S4 finishes replay T1~T4 trigger rebalance. Member S1~S4 join with following status: S1(assigned: [], revoked: [T1, T2], learning: []) S2(assigned: [], revoked: [T3, T4], learning: []) S3(assigned: [], revoked: [], learning: [T1, T2]) S4(assigned: [], revoked: [], learning: [T3, T4]) S1 performs task assignments: S1(assigned: [], revoked: [], learning: []) S2(assigned: [], revoked: [], learning: []) S3(assigned: [T1, T2], revoked: [], learning: []) S4(assigned: [T3, T4], revoked: [], learning: []) S1~S2 will shutdown themselves upon new assignment since there is no assigned task left. |
...
Edge Scenarios
...
Leader Transfer During Scaling
...
Code Block | ||||
---|---|---|---|---|
| ||||
Cluster has 3 stream workers S1(leader), S2, S3, and they own tasks 1 ~ 5 Group stable state: S1[T1, T2], S2[T3, T4], S3[T5] #First Rebalance New member S4 joins the group S1 performs task assignments: S1(assigned: [T1, T2], revoked: [], learning: []) S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5], revoked: [], learning: []) S4(assigned: [], revoked: [], learning: [T1]) #Second Rebalance S1 crashes/gets killed before S4 is ready, S2 takes over the leader. Member S2~S4 join with following status: S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5], revoked: [], learning: []) S4(assigned: [], revoked: [], learning: [T1]) Note that T2 is unassigned, and S4 is learning T1 which has no current active task. We could rebalance T1, T2 immediately. S2 performs task assignments: S2(assigned: [T3, T4], revoked: [], learning: []) S3(assigned: [T5, T2], revoked: [], learning: []) S4(assigned: [T1], revoked: [], learning: []) Now the group reaches balance. |
Static Member Unavailability
conflict with existing rebalance timeout behavior
...
Optimizations
Stateful vs Stateless Tasks
...