Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For the Blink planner (imperfect) heuristics are implemented to avoid slot requests timeouts as long as the job has a chance to finish (see SlotPool#requestNewAllocatedBatchSlot()). The heuristics assumes that if at least one slot is present, the job will be able to finish. However, if the job requires more than one slot at the same time, the heuristics fails.

...

Note that the above algorithm works for streaming jobs because all tasks in a streaming job are connected with each other by pipelined data exchanges. If the streaming job employs a shuffle, all tasks land in the same pipelined region and the Pipelined Region Scheduler will trivially schedule all tasks at the same time. For streaming jobs that do not employ a shuffle, one may or may not have to apply special considerations (see Embarrassingly parallel Streaming Jobs).

Pipelined Region Scheduling Strategy

...

  • ALL_EDGES_BLOCKING : The most conservative setting. Should only be used with special consideration.
  • FORWARD_EDGES_PIPELINED : With this mode, each pipelined region would need one and only one slot to run. Can be used in resource limited scenarios or if it should be guaranteed that the job can successfully run with only 1 slot. 
  • POINTWISE_EDGES_PIPELINED : Pointwise distribution pattern includes FORWARD and RESCALE. With this mode, RESCALE edges can be pipelined, at the cost of larger regions that may need more slots at the same time. However, in most cases, the number of required slots is much smaller than the parallelism. 
  • ALL_EDGES_PIPELINED : This would require slots no less that the parallelism. It saves time on scheduling tasks and can be used for interactive queries (see FLINK-16543).


StreamGraph will be extended with a new field to host the GlobalDataExchangeMode. In the JobGraph generation stage, this mode will be used to determine the data exchange type of each job edge.

...

  • Option 1: SlotPool releases unused slots to RM and waits for the pending requests in RM to be fulfilled. Slot requests related to the released slots should also be re-sent to RM.
  • Option 2: Force FIFO slot allocation in SlotManager. We can do this after the SlotManager is pluggable (FLINK-14106).

Resource deadlocks when slots of different sizes are improperly assigned to slot requests

...