In the case of a simple Map().withBroadcastSet() we achieve the following results on the IBM-cluster:

suite	name	average_time (ms)
broadcast.ibm-power-1	broadcast.01	6597.33
broadcast.ibm-power-1	broadcast.02	5997
broadcast.ibm-power-1	broadcast.04	6576.66
broadcast.ibm-power-1	broadcast.08	7024.33
broadcast.ibm-power-1	broadcast.16	6933.33

You can see that the run time stays constant, which is exactly what we want. So the next step is to extend iterations to make it work in general.

But there was also an error with this approach. We create subpartitions which are not used. Subpartitions are only released when they are consumed. So with this approach we produce a memory leak.

Sixth Approach - Use one subpartition only

With this approach we only create one subpartition per task / slot. To make this possible we have to redirect all execution edges accordingly.

Image Added

Now we have to figure out when to release the blocking partition.

Compatibility, Deprecation, and Migration Plan

...

Page tree

Versions Compared

Old Version 6

New Version 7

Key

Sixth Approach - Use one subpartition only

Compatibility, Deprecation, and Migration Plan

Page tree

Page History

Versions Compared

Old Version 6

New Version 7

Key

Sixth Approach - Use one subpartition only

Compatibility, Deprecation, and Migration Plan