Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The approach proposed by this FLIP should only applies to jobs of DataStream API and SQL/Table API by the Blink planner (unbounded streaming and bounded batch jobs). It should not affect jobs of DataSet API.
    • For DataSet jobs, there are already some fraction based approach (in TaskConfig and ChainedDriver), and we do not make any change to the existing approach. 
  • This FLIP assumes that for jobs with known operators' resource requirements, the requirements are already properly described by ResourceSpecs in PhysicalTransformations.
    • This FLIP does not discuss how to set operators' resource requirements for a job.
    • Current status (including plans for Flink 1.10) of how to set operators' resource requirements for jobs can be described as follows:
      • SQL/Table API - Blink optimizer can set operator resources for the users, according to their configurations (default: unknown)
      • DataStream API -  There are no method / interface to set operator resources at the moment. It can be added in the future.
      • DataSet API - There are existing user interfaces to set operator resources.

...

  • PhysicalTransformations contains ResourceSpecs, unknown (by default) or specified (e.g., by blink planner), that describe resource requirements of the transformation.
  • While generating job graph, StreamingJobGraphGenerator calculates fractions (of the slot managed memory) for operators and set to the StreamConfigs.
  • While scheduling, operators' ResourceSpecs are converted tasks' ResourceProfiles (ResourceSpecs of chained operators + network memory). Tasks are deployed to slots / TMs according to the ResourceProfiles.
  • While starting tasks in TMs, each operator gets the fraction of the slot managed memory, which is either original requested absolute value or a fair share for the unknown requirement. 

...

The StreamingJobGraphGenerator sets tasks of different pipelined regions into different slot sharing groups. In this way, when the StreamingJobGraphGenerator sets relative managed memory quota for operators, it will calculate the fractions only considering operators that might run at the same time. This improves resource utility for bounded batch jobs where usually not all tasks run concurrently.

...

  • Introduce option allSourcesInSamePipelinedRegion in ExecutionConfig
  • Set it to true by default
  • Set it to false for SQL for SQL/Table API jobs (from blink planner)API bounded batch jobs by the Blink planner

This step should not introduce any behavior changes. 

...