Intro...

Intermediate results

Partitions

Subpartitions

Result availability notifications

When producing an intermediate result, the producing task is responsible to notify the JobManager about available data. Each task instance produces one of the result partitions in parallel. Depending on the pipelined vs. blocking characteristic of an intermediate result, the notifications happen at different times of the task life-cycle.

  • Pipelined resultsnotify JobManager as soon as the first Buffer instance is added to the partition. The JobManager deploys all tasks, which consume the partition.
    • Related code: ResultPartition sends the notifications and Execution#scheduleOrUpdateConsumers acts on it.
  • Blocking results: notification is piggy-backed to the final state transition. The JobManager deploys consuming tasks only after all tasks producing the result have been finished.
    • Related code: there is no extra notifications at the ResultPartition and Execution#markFinished checks whether all partitions have been finished.

Receiver-initiated sending

The consuming tasks are deployed with all information necessary to request the partitions, which they need to consume. Tasks can then request the respective result partitions either

  • locally from the same task manager they are running on (via ResultPartitionManager), or
  • remotely from a different task manager (via PartitionRequestClient).

In certain situations, it is also possible that a consuming task does not yet know where a result partition is located. In this case, requests are delayed until this is known and notifications happen via update message at runtime (see TaskManager#updateTask).

Buffers & Events

 

Queues, Data production, Data Consumption

 

Buffer Pools

 

Netty

 

Caching Intermediate Results

  • No labels