Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Add RestAPI / WebUI and GPUInforamtion to the Public Interfaces section

...

  • “taskmanager.resource.gpu.amount”: Define how many GPUs in a task executor. The default value should be 0.
  • “taskmanager.resource.gpu.discovery-script.path”: Define the path of the discovery script. See Discovery Script Section.
  • “taskmanager.resource.gpu.discovery-script.args”: Define the arguments passed to the discovery script. See Discovery Script Section.
  • “kubernetes.taskmanager.resource.gpu.vendor”: Define the vendor of the GPU resource. In Kubernetes, the configuration key of GPU resource is “<vendor>.com/gpu”[3]. Only accept “nvidia” and “amd” at the moment. Only valid for Kubernetes mode.

RestAPI / WebUI (Need to get the information of GPU resource through the RestAPI and WebUI)

Introduce the GPUInforamtion class, which contains the information of a GPU card. UDF could get it from RuntimeContext and FunctionContext.

Proposed Changes

  • User sets the “taskmanager.resource.gpu.amount”  and specifies the “taskmanager.resource.gpu.discovery-script.[path|args]” if needed.
  • For Yarn/Kubernetes mode, Flink maps the “taskmanager.resource.gpu.amount” to the corresponding field of resource requests to the external resource manager.
  • Introduce a GPUManager, which will execute the discovery script and get the available GPU resources from the output.
  • Operators get the GPU resource information from GPUManager

...