Authors: Wei Zhong, Dian Fu
Status
Current state: "Accepted"
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
...
It is necessary to support managing dependencies and environment through command line so that the python jobs with additional dependencies can be submitted via "flink run" and web UI or other approached in the future. The PythonDriver class will support several new options as follows:
Short Name | Full Name | Syntax | Description |
---|---|---|---|
-pyfs | --pyFiles | -pyfs |
<filePaths> | This option already exists but it only appends the file to client side PYTHONPATH currently. Now it will upload the file to cluster and append it to python worker’s PYTHONPATH, which is equivalent to "add_python_file". |
-pyexec | -- |
pyExecutable | -pyexec |
<pythonInterpreterPath> | This option is equivalent to `TableEnvironment#get_config().set_python_executable()`. |
-pyreq | -- |
pyRequirements | -pyreq |
<requirementsFile>#<requirementsCachedDir> | This option is equivalent to "set_python_requirements". "#" can be used to as the separator if "requirementsCachedDir" exists. |
-pyarch | -- |
pyArchive | -pyarch |
<archiveFile1>#<extractName>,<archiveFile2>#<extractName> | The |
option is equivalent to "add_python_archive". "," can be used as the separator for multiple archives and "#" can be used as the separator if "extractName" exists. |
Implementation
Implementation of SDK API
...