Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that in Flink, it is TaskManagers, not tasks, that exchange data over the network, i.e., data exchange between tasks that live in the same TM is multiplexed over one network connection.

Image Added

ExecutionGraph: The execution graph is a data structure that contains the “ground truth” about the job computation. It consists of vertices (ExecutionVertex) that represent computation tasks, and intermediate results (IntermediateResultIntermediateResultPartition), that represent data produced by tasks. IntermediateResults are comprised by IntermediateResultPartitions that represent data to be consumed by a specific receiving task.


These are logical data structures that live in the JobManager. They have their runtime equivalent structures that are responsible for the actual data processing that live at the TaskManagers. The runtime equivalent of the IntermediateResult IntermediateResultPartition is called ResultPartition, and the the runtime equivalent of the IntermediateResultPartition is called ResultSubpartition.

ResultPartition (RP) represents a chunk of data that a BufferWriter writes to, i.e., a chunk of data produced by a single task. A RP is a collection of Result Subpartitions (RSs). This is to distinguish between data that is destined to different receivers, e.g., in the case of a partitioning shuffle for a reduce or a join.

...