Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I think that speculative execution means that multiple executions in a ExecutionVertex running at the same time, and failover means that two executions running at two different time.

Two Multiple executions of a upstream ExecutionVertex will produce two produce Multiple ResultPartitions. When all upstream ExecutionVertexs run finished, the inputChannel of down stream executions will be updated to consume the fastest finished execution of upstream. So add a member-variable named fastestFinishedExecution in ExecutionVertex used for create PartitionInfo which used for update down stream executions' inputChannels. 

Code Block
languagejava
titleDefaultExecutionGraph
public class ExecutionVertex
        implements AccessExecutionVertex, Archiveable<ArchivedExecutionVertex> {
	private Execution fastestFinishedExecution = null;
}

when the down stream execution meet DataConsumptionException. It will restart with the upstream execution that has been finished.

这里要画一张图。把failover的情况加进去,特别是下游带着上游重启的情况。

If a task read from a blocking result partition, when its input is not available, we can ‘revoke’ the produce task, set the task fail and rerun the upstream task to regenerate data.

In certain scenarios producer data was transferred in blocking mode or data was saved in persistent store. If the partition was missing, we need to revoke/rerun the produce task to regenerate the data.

这里要画一张图。把failover的情况加进去,特别是下游带着上游重启的情况。


Manage sink files

这个就是对于batch的sink file,到每个文件后面加全局唯一的后缀,然后最后在作业全部结束的时候,在主节点对于各种情况,去rename或者delete.

...