“taskmanager.resource.gpu.amount”: Define how many GPUs in a task executor. The default value should be 0.
“taskmanager.resource.gpu.discovery-script.path”: Define the path of the discovery script. See Discovery Script Section.
“taskmanager.resource.gpu.discovery-script.args”: Define the arguments passed to the discovery script. See Discovery Script Section.
“kubernetes.taskmanager.resource.gpu.vendor”: Define the vendor of the GPU resource. In Kubernetes, the configuration key of GPU resource is “<vendor>.com/gpu”[3]. Only accept “nvidia” and “amd” at the moment. Only valid for Kubernetes mode.

RestAPI / WebUI (Need to get the information of GPU resource through the RestAPI and WebUI)

Introduce the GPUInforamtion class, which contains the information of a GPU card. UDF could get it from RuntimeContext and FunctionContext.

Proposed Changes

Overview

User sets the “taskmanager.resource.gpu.amount” and specifies the “taskmanager.resource.gpu.discovery-script.[path|args]” if needed.
For Yarn/Kubernetes mode, Flink maps the “taskmanager.resource.gpu.amount” to the corresponding field of resource requests to the external resource manager.
Introduce a GPUManager, which will execute the discovery script and get the available GPU resources from the output.
Operators get the GPU resource information from GPUManager

...