Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Manage input and output of ExecutionVertex

Manage InputSplit

前提,当一个ExecutionJobVertex中,绝大多数ExecutionVertex运行完以后,剩下的每个ExecutionVertex要处理的inputSplit是确定的。

然后就要想办法,让预测执行的execution所处理的inputSplit跟原execution要一样,这样才能保证包含source operator的region能支持预测执行。

这里要画一张图。

There is a prerequisite for enable speculative execution in batch job is when most ExecutionVertexs are run finished, the InputSplit to be processed by each remaining ExecutionVertexs is determined.

We must ensure that the InputSplits processed by speculative execution is the same as the original execution.

Based on FLINK-10205, we could ensure ....

simply return the InputSplits to the InputSplitAssigner also implies transaction between task and jobManager (maybe multiple one), we need to make sure the inputSplits get return to the InputSplitAssigner exactly once. what happened if we have speculative execution, which means two task consume the same set of InputSplits and but not fail at same time, does every InputSplitAssigner need to keep a list to deduplicate? what happened if the TM died or has network issue and InputSplit cannot be return?

Furthermore, if there are two executions run at the same time (in batch scenario), this two executions should process same splits.

下面这两个pr里面的讨论,要说一下,

https://github.com/apache/flink/pull/6684

the logical of code is when task failover the splits will not be strictly consumed by same ExecutionVertex.

So, I will change..https://github.com/apache/flink/pull/8125


Manage middle ResultPartition 

...