Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. For session clusters running short lived-jobs like OLAP, we should treat session clusters as a long-running service. Having running workers all the time can greatly improve the service stability and reduce the job's cold-start latency;

  2. For application mode, batch job might be scheduled stage by stage. If the next region requires more resources, it might take more time to pull up required resources. If users actually know how many resources are needed when running a single job, initialize all workers when cluster starts can speed up the resource allocation process;

  3. Flink supports FLINK-12122 [Spread out tasks evenly across all available registered TaskManagers], but this requires enough registered TaskManagers. For session cluster, all TaskManagers starting at the beginning can thoroughly solve this problem; for application mode, min required TaskManagers is allocated with best effort which might also help improve the evenly distribution but not guaranteed solve this;


Public Interfaces

Option nameDefault Value
slotmanager.number-of-slots.min0
slotmanager.min-total-resource.cpuno default value, it can be derived from slotmanager.number-of-slots.min
slotmanager.min-total-resource.memoryno default value, it can be derived from slotmanager.number-of-slots.min

...