THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- “taskmanager.resource.gpu.amount”: Define how many GPUs in a task executor. The default value should be 0.
- “taskmanager.resource.gpu.discovery-script.path”: Define the path of the discovery script. See Discovery Script Section.
- “taskmanager.resource.gpu.discovery-script.args”: Define the arguments passed to the discovery script. See Discovery Script Section.
- “kubernetes.taskmanager.resource.gpu.vendor”: Define the vendor of the GPU resource. In Kubernetes, the configuration key of GPU resource is “<vendor>.com/gpu”[3]. Only accept “nvidia” and “amd” at the moment. Only valid for Kubernetes mode.
RestAPI / WebUI (Need to get the information of GPU resource through the RestAPI and WebUI)
Introduce the GPUInforamtion class, which contains the information of a GPU card. UDF could get it from RuntimeContext and FunctionContext.
Proposed Changes
Overview
- User sets the “taskmanager.resource.gpu.amount” and specifies the “taskmanager.resource.gpu.discovery-script.[path|args]” if needed.
- For Yarn/Kubernetes mode, Flink maps the “taskmanager.resource.gpu.amount” to the corresponding field of resource requests to the external resource manager.
- Introduce a GPUManager, which will execute the discovery script and get the available GPU resources from the output.
- Operators get the GPU resource information from GPUManager
...