Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagetext
titleScale-down stream applications
Cluster has 3 stream workers S1(leader), S2, S3, and they own tasks 1 ~ 5
Group stable state: S1[T1, T2], S2[T3, T4], S3[T5]

#First Rebalance 
New member S4 joins the group
S1 performs task assignments:
	S1(assigned: [T1, T2], revoked: [], learning: [])
	S2(assigned: [T3, T4], revoked: [], learning: [])
	S3(assigned: [T5], revoked: [], learning: [])
	S4(assigned: [], revoked: [], learning: [T1])


#Second Rebalance
S1 crashes/gets killed before S4 is ready, S2 takes over the leader.
Member S2~S4 join with following status: 
	S2(assigned: [T3, T4], revoked: [], learning: [])
	S3(assigned: [T5], revoked: [], learning: []) 
	S4(assigned: [], revoked: [], learning: [T1])
Note that T2 is unassigned, and S4 is learning T1 which has no current active task. We 
could rebalance T1, T2 immediately.	
S2 performs task assignments:
	S2(assigned: [T3, T4], revoked: [], learning: [])
	S3(assigned: [T5, T2], revoked: [], learning: [])
	S4(assigned: [T1], revoked: [], learning: [])
Now the group reaches balance.

...

  1. Assign all active tasks to learner tasks that indicates "ready"
  2. Assign the rest of active tasks to previous owners
  3. Assign the rest of active tasks towards unready learner tasks owners
  4. Assign the rest of active tasks to resource available hosts

Assign learner tasks:

  1. Keep the existing learner tasks unless 
  2. If the load is not balanced between hosts, assign abundant task learner tasks to hosts with least active tasks
  3. As long as the group members/ number of tasks are not changing, there should be a defined balanced stage instead of forever rebalancing.
  4. Instances with standby tasks have higher priority to be chosen as learner task assignor. The standby task will convert to learner task immediately.


For each instance, the total number of tasks it owns should be within the range of 0.5 ~ 2 times of the expected number of tasks, which buffers some capacity in order to avoid imbalance.

We could even provide a stream.balancing.factor for the user to configure. Default as we have seen is 2. The smaller this number sets to, the more strict the assignment will behave.

As we could see, there should be only exactly one learner task after each round of rebalance, and there should be exactly one corresponding active task at the same time. 

...