Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Authors:  Wei Zhong, Dian Fu

Status

Current state: "Accepted"

Discussion threadhttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html

...

It is necessary to support managing dependencies and environment through command line so that the python jobs with additional dependencies can be submitted via "flink run" and web UI or other approached in the future. The PythonDriver class will support several new options as follows:

Short NameFull NameSyntaxDescription

-pyfs

--pyFiles

-pyfs

<file-path>

<filePaths>

This option already exists but it only appends the file to client side PYTHONPATH currently. Now it will upload the file to cluster and append it to python worker’s PYTHONPATH, which is equivalent to "add_python_file".

-pyexec

--

python-executable-path

pyExecutable

-pyexec

<python-executable-path>

<pythonInterpreterPath>

This option is equivalent to `TableEnvironment#get_config().set_python_executable()`. 

-pyreq

--

python-requirements

pyRequirements

-pyreq

<requirements-file-path> <cache-dir-path>

<requirementsFile>#<requirementsCachedDir>

This option is equivalent to "set_python_requirements". "#" can be used to as the separator if "requirementsCachedDir" exists.

-pyarch

--

python-archive

pyArchive

-pyarch

<archive-file-path> <extract-name>

<archiveFile1>#<extractName>,<archiveFile2>#<extractName>

The

This

option is equivalent to "add_python_archive". "," can be used as the separator for multiple archives and "#" can be used as the separator if "extractName" exists.  

Implementation

Implementation of SDK API

...