Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • external-resource.gpu.amount”: Define how many GPUs in a task executor. The default value should be 0.
  • external-resource.gpu.param.discovery-script.path”: Define the path of the discovery script. See Discovery Script Section.
  • external-resource.gpu.param.discovery-script.args”: Define the arguments passed to the discovery script. See Discovery Script Section.
  • external-resource.gpu.param.vendor”: Define the vendor of the GPU resource. In Kubernetes, the configuration key of GPU resource is “<vendor>.com/gpu”[3]. Only accept “nvidia” and “amd” at the moment. Only valid for Kubernetes mode.

RestAPI / WebUI (Need to get the information of GPU resource through the RestAPI and WebUI)

Introduce the ExternalResourceInfo class, which contains the information of the external resources. RichFunction/UserDefinedFunction Operators and functions could get those that information from the RuntimeContext.

For GPU resource, we introduce the GPUInforamtion class, which contains the index of a GPU card.


RestAPI / WebUI (Need to get the information of GPU resource through the RestAPI and WebUI)

Proposed Changes

  • We introduce the ExternalResourceDriver for external resource allocation and management.
  • User sets the “taskmanager.resource.gpu.amount”  and specifies the “external-resource.gpu.param.discovery-script.[path|args]” if needed.
  • For Yarn/Kubernetes mode, Flink maps the “taskmanager.resource.gpu.amount” to the corresponding field of resource requests to the external resource manager.
  • Introduce a GPUManager, which will execute the discovery script and get the available GPU resources from the output.
  • RichFunction/UserDefinedFunction Operators and functions get the GPU resource information from GPUManager

...

On the TaskExecutor side, ExternalResourceDriver takes the responsibility to detect and provide information of external resources. TaskExecutor does not need to manage a specific external resource by itself, RichFunction/UserDefinedFunction would Operators and functions would get the ExternalResourceInfo from RuntimeConext.

...