Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update example

...

In this diagram, we show an example dataflow which reads from two different sources. Log 1 has a backlog growth rate of 100 records per time unit. Similarly, Log 2 has a backlog growth of 500. This means that without any processing, the backlog grows by the 100 or 500 records, respectively. Source 1 is able to read 10 records per time unit, Source 2 reads 50 records per time unit. The downstream operators process the records from the source and produce new records for their downstream operators. For simplicity, the number of inflowing and outflowing records for an operator is always the same here. The actual algorithm doesn't make that assumption.

Image Removed

Now let's evaluate one round by the algorithm:

Image Removed

When the scaling is strictly linear, this will lead the following optimal result:

Image Removed

Image Added

According to the reference paper, in the worst case it takes up to three attempts to reach this state. However, it can already converge after one scaling attempt. Note that the algorithm handles up- and downscaling. In the example, one of the vertices is scaled by parallelism 1 which won't change the parallelism, but it is also possible for vertices to be scaled down if the scaling factor is lower than 1.

...