Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The above metric seems to yield reasonable results. Examples:

Number of records received by sub-tasks within an operatorSkewness

1 1 1 5 10 (i.e. five subtasks, first three each receives 1 record and last two gets 5 and 10 records each)

88%

5 5 5 5 5

0%

1 1 5 5 5

54%

4 5 5 5 5

7%

0 0 0 0 0
Undefined

0 (idle operator)


The accumulation of "received number of records" over a long period of time can hide a recent data skew event. The same can also hide a recent fix to an existing data skew problem. Therefore the proposed metric will need to look at the change in the received number of records within a period, similar to the existing "Backpressure" or "Busy" metrics on the Flink Job Graph.

...