You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current state: "Under Discussion"

Discussion thread

JIRA: Unable to render Jira issues macro, execution error.

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Task blocking in requestMemory usually holds checkpointLock. If the back pressure is severe, the Task cannot apply for enough memorySegment in a short time, and the Checkpoint will take a long time. In some past JIRAs of Unaligned Checkpoint, the community added recordWriter.isAvaliable() to alleviate the problem of tasks blocking in requestMemory due to insufficient memory segments while processing data. It can effectively reduce the probability of Task being blocked in requestMemory.

As mentioned by FLINK-14396: As long as there is at-least one available buffer in LocalBufferPool, the RecordWriter is available for network output in most cases.  So it can only solve the scenario where only one buffer is needed to process a single record. When the back pressure is severe, if multiple output buffers are required to process a single record, the Task may still be blocked on requestMemory, resulting in Checkpoint not being able to complete quickly. For example:

  • Big record which might span multiple buffers
  • Flatmap-like operators which might emit multiple records in every process
  • Broadcast watermark which might request multiple buffers at a time

In this FLIP, we propose to add the overdraft buffer in order to reduce the probability of Task being blocked in requestMemory when multiple output buffers are required to process a single record.

Overdraft Buffer mechanism: When LocalBufferPool#requestMemory is called and LocalBufferPool is insufficient, LocalBufferPool will allow Task to overdraw some MemorySegments and LocalBufferPool will not be available. The LocalBufferPool cannot become available until all the overdraft buffers are consumed by downstream tasks and the LocalBufferPool has recycled these overdraft buffers.

Public Interfaces


Proposed Changes


Compatibility, Deprecation, and Migration Plan


Test Plan

  • Test for apply for overdraft buffer when overdraft buffer is sufficient
  • Test for apply for overdraft buffer when overdraft buffer is insufficient
  • Checkpoint Duration Benchmark for enable overdraft buffer 

Rejected Alternatives


  • No labels