Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Public Interfaces

Configuration

 

  @Experimental    public

 @Experimental
    public static final ConfigOption<String> PYTHON_EXECUTION_MODE =

            ConfigOptions

        ConfigOptions.key("python.execution-mode")

                    

                .stringType()
                .defaultValue("process")

                    

                .withDescription(

                            

                        "Specify the python runtime execution mode. The optional values are

'process' and `thread`."                                    

`process`, `multi-thread` and `sub-interpreter`. "
                                + "The `process` mode means that the Python user-defined functions will be executed in

a

separate Python process. "

                                    

                                + "The

`thread`

`multi-thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. "
                                + "The `sub-interpreter` mode means that the Python user-defined functions will be executed in python different sub-interpreters rather than different threads of one interpreter, "
                                + "which can largely overcome the effects of the GIL, but it maybe fail in some CPython extensions libraries, such as numpy, tensorflow. "
                                + "Note that if the python operator dose not support `multi-thread` and `sub-interpreter` mode, we will still use `process` mode.");

We will introduce a new Python Configuration `python.execution-mode`,  which is used to specify the python runtime execution mode. The possible values are `process` and `thread`. The `process` mode means that the Python user-defined functions will be executed in a separate Python Process and it is the current PyFlink Runtime execution mode. The `thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, which is the new execution mode we will discuss in this FLIP.

Proposed Changes

The architecture of Process Mode

...

Comparison with Other Proposals

Framework

Principle

Limitations

Jython

Python compiler implemented in Java

  • Only supports for Python2

GraalVM

Truffle framework 

  • Compatibility issues with various Python ecological libraries
  • Replaces Hotspot VM with GraalVM

JPype

JNI + Python/C API

  • Don’t support Java calling Python
  • Only support for CPython

Jep

JNI + Python/C API

  • Only supports installing from source
  • Difficult to use as middleware
  • The framework overhead is big
  • Only support for CPython

PEMJA

JNI + Python/C API

  • Only support for CPython

In the table above, we list the comparison of other proposals and PEMJA. We will analyze them one by one.

...

Comparison with Process Mode

Execution Mode

Benefits

Limitations

Process Mode

  • Better resource isolation
  • Usage restrictions
  • IPC overhead
  • High implementation complexity

Thread Mode

  • Better performance
  • Non usage restrictions
  • Only support for CPython
  • Multiple jobs cannot use different Python interpreters in session mode

Process Mode: Java Operator sends(receives) batches of data to(from) Python Worker asynchronously on Process Mode, which makes it impossible for Python UDF to run on the same Node of JobGraph with other Java UDFs. So it limits the usage of Python UDF in some scenarios, such as cep, join, etc. In terms of performance, due to inter-process communication, there will be an additional process of serialization/deserialization.

...