Page History

...

For most sources like Kafka there is an upper bound on the parallelism based on the number of partitions. If backlog information is not available, users can set desired parallelism or target rate. We also aim at choosing a parallelism which yields an equal distribution of the splits among the subtasks.

Sidenote about scaling sources without backlog information: Backlog growth information is necessary to scale sources succesfully under backpressure. However there is nothing stopping us from scaling the sources on a busyTime/CPU time basis once backpressure has been eliminated by scaling up downstream operators. This was not part of our initial prototype implementation but it could be extended easily, as it does not affect the algorithm at large.
Most regular streaming sources should in any case provide backlog / input rate information even if they are not currently exposed under pendingRecords. It should be our goal to improve connector metrics, not only to provide the best autoscaling capability but good operational experience in general.

3. Set downstream parallelisms (utilization-based scaling)

...

Page tree

Versions Compared

Old Version 16

New Version 17

Key

3. Set downstream parallelisms (utilization-based scaling)