...
For session clusters running short lived-jobs like OLAP, we should treat session clusters as a long-running service. Having running workers all the time can greatly improve the service stability and reduce the job's cold-start latency;
For application mode, batch job might be scheduled stage by stage. If the next region requires more resources, it might take more time to pull up required resources. If users actually know how many resources are needed when running a single job, initialize all workers when cluster starts can speed up the resource allocation process;
Flink supports FLINK-12122 [Spread out tasks evenly across all available registered TaskManagers], but this requires enough registered TaskManagers. For session cluster, all TaskManagers starting at the beginning can thoroughly solve this problem; for application mode, min required TaskManagers is allocated with best effort which might also help improve the evenly distribution but not guaranteed solve this;
Public Interfaces
Option name | Default Value |
---|---|
slotmanager.number-of-slots.min | 0 |
slotmanager.min-total-resource.cpu | no default value, it can be derived from slotmanager.number-of-slots.min |
slotmanager.min-total-resource.memory | no default value, it can be derived from slotmanager.number-of-slots.min |
...