Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The client rack awareness additions in partition assignments also affects how Kafka Streams can assign its tasks to clients so that cross rack traffic can be avoided as much as possible. This design discusses how we can augment current assignment logic in Kafka Streams to utilize rack information to minimize cross rack traffic with other constraints. Note the design doesn't change the rebalance protocol itself and only changes the assignment logic (read path in group leader in clients).

3. Current Task Assignment Logic

...

Klein’s cycle canceling algorithm can solve the min-cost flow problem in O(E^2CUE^2 * M * U) time where C M is max cost and U is max capacity. In our particular case, U is 1 and C is at most 3 (A task can have at most 3 topics including changelog topic?)since a task can be assigned once to one client. M is related to the number of topic partitions a task can subscribe. So the algorithm runs in O(E^2ME^2) time for our case. Note E is |T| * N|C|  where T is are the number of tasks and N is C are the number of clients. This is because each task can be assigned to any client; so there is an edge because every task to every client. In another word, the time complexity of the algorithm is O(T^2*N^2M * |T|^2 * |N|^2) . Below is a rough sketch of the algorithm:

...

Negative cycles and residual graphs can be read here. Negative cycles can be found using Bellman-ford’s algorithm. How to find a feasible flow? It can be found by using step 1 - 2 of HAAssignor . What’s even nice about the cycle canceling algorithm is that since our graph is a bipartite graph, every time we find a negative cycle, the client nodes in the cycle satisfies that if there’s an in edge, there must be an out edge which means the flow of the clients won’t be changed. As a result, each round of the cycle canceling will reduce the cost and won’t change the total assignments of a client. So eventually, when there’s no more negative cycles, we find a min-cost balanced assignment. Below is an example of finding an min cost of 2 to assign task t1, t2  to clients c1, c2 .


B. Rack awareness assignment algorithm for active stateful tasks

I. Min cost with unbalanced sub-topology

...

One drawback of this solution is that the final solution is based on the number of tasks which can be assigned to each client computed from HAAssignor’s step 1 - 2. If every client has an equal number of threads but the number of tasks couldn’t be divided equally, then there’s one client with a different number of tasks. Ideally, this client could be anyone when we find our min cost assignment, but we fix it during our computation, which may result in a slight sub-optimal solution.

The runtime complexity of this algorithm is also higher since there are more edges in the graph. Suppose we have S  sub-topologies, then we have (|T| + |T|*|C| + S*|C| + |C|) edges, which is still bounded by O(|T|*|C|)  edges. So the time complexity is still O(M * T^2 * N^2) .

Since 34.CB.I and 34.CB.II have different tradeoffs and work better in different workloads etc, we could add an internal configuration a config to choose one of them at runtime.

...