Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Support lazy initialization of parallelism in OperatorCoordinator and related components.
  2. Introduce DynamicParallelismInference and DynamicFilteringInfo interfaces. Add preparation and invocation of methods with DynamicParallelismInference interface parameters in SourceCoordinator, and expose SourceCoordinator in ExecutionJobVertex.
  3. Improve the logic of AdaptiveBatchScheduler for dynamic source parallelism inference.
  4. Hive/File sources support dynamic parallelism inference and change the default value of 'table.exec.hive.infer-source-parallelism' to false in batch scenarios.

Compatibility, Deprecation, and Migration Plan

...

For batch jobs that rely on the adaptive batch scheduler to infer the parallelism of sources, the `execution.batch.adaptive.auto-parallelism.default-source-parallelism` serves as an upper limit for the inferred parallelism rather than the final parallelism. Additionally, if `execution.batch.adaptive.auto-parallelism.default-source-parallelism` is not set, the globally default parallelism is used as the upper limit for the inferred parallelism.

For HiveSource, we may have a dedicated discussion in the future to see if we need to change the default value of `table.exec.hive.infer-source-parallelism` to false. Before then, user can manually set `table.exec.hive.infer-source-parallelism` to false to enable dynamic parallelism inference, and can use `execution.batch.adaptive.auto-parallelism.default-source-parallelism` to replace `table.exec.hive.infer-source-parallelism.max` as the parallelism inference upper bound.

Limitations

It only works for batch jobs which use AdaptiveBatchScheduler.

...