Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state: "Under Discussion"

Discussion thread:   https://lists.apache.org/thread/4p3xcf0gg4py61hsnydvwpns07d1nog7

JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-26762

...

As mentioned by FLINK-14396: As long as there is at-least one available buffer in LocalBufferPool, the RecordWriter is available for network output in most cases.  So it can only solve the scenario where only one buffer is needed to process a single record. When the back pressure is severe, if multiple output buffers are required to process a single record, the Task may still be blocked on requestMemory, resulting in Checkpoint not being able to complete quickly. For example:

  • Big record which might span multiple buffers
  • Flatmap-like operators which might emit multiple records in every process
  • Broadcast watermark which might request multiple buffers at a time

In this FLIP, we propose to add the overdraft buffer in order to reduce the probability of Task being blocked in requestMemory when multiple output buffers are required to process a single record.

Overdraft Buffer mechanism: When LocalBufferPool#requestMemory is called and LocalBufferPool is insufficient, LocalBufferPool will allow Task to overdraw some MemorySegments and LocalBufferPool will not be available. The LocalBufferPool cannot become available until all the overdraft buffers are consumed by downstream tasks and the LocalBufferPool has recycled these overdraft buffers.

Public Interfaces


Proposed Changes

...