...
- Support lazy initialization of parallelism in OperatorCoordinator and related components.
- Introduce DynamicParallelismInference and DynamicFilteringInfo interfaces. Add preparation and invocation of methods with DynamicParallelismInference interface parameters in SourceCoordinator, and expose SourceCoordinator in ExecutionJobVertex.
- Improve the logic of AdaptiveBatchScheduler for dynamic source parallelism inference.
- Hive/File sources support dynamic parallelism inference and change the default value of 'table.exec.hive.infer-source-parallelism' to false in batch scenarios.
Compatibility, Deprecation, and Migration Plan
...
For batch jobs that rely on the adaptive batch scheduler to infer the parallelism of sources, the `execution.batch.adaptive.auto-parallelism.default-source-parallelism` serves as an upper limit for the inferred parallelism rather than the final parallelism. Additionally, if `execution.batch.adaptive.auto-parallelism.default-source-parallelism` is not set, the globally default parallelism is used as the upper limit for the inferred parallelism.
For HiveSource, we may have a dedicated discussion in the future to see if we need to change the default value of `table.exec.hive.infer-source-parallelism` to false. Before then, user can manually set `table.exec.hive.infer-source-parallelism` to false to enable dynamic parallelism inference, and can use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` to replace `table.exec.hive.infer-source-parallelism.max` as the parallelism inference upper bound.
Limitations
It only works for batch jobs which use AdaptiveBatchScheduler.
...