...
Public Interfaces
Configuration
|
@Experimental |
ConfigOptions.key("python.execution-mode") |
.stringType() |
.withDescription( |
"Specify the python runtime execution mode. The optional values are |
`process`, `multi-thread` and `sub-interpreter`. " |
separate Python process. " |
+ "The |
`multi-thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, but it will be affected by GIL performance. " |
We will introduce a new Python Configuration `python.execution-mode`, which is used to specify the python runtime execution mode. The possible values are `process` and `thread`. The `process` mode means that the Python user-defined functions will be executed in a separate Python Process and it is the current PyFlink Runtime execution mode. The `thread` mode means that the Python user-defined functions will be executed in the same thread as Java Operator, which is the new execution mode we will discuss in this FLIP.
Proposed Changes
The architecture of Process Mode
...
Comparison with Other Proposals
Framework | Principle | Limitations |
Python compiler implemented in Java |
| |
Truffle framework |
| |
JNI + Python/C API |
| |
JNI + Python/C API |
| |
JNI + Python/C API |
|
In the table above, we list the comparison of other proposals and PEMJA. We will analyze them one by one.
...
Comparison with Process Mode
Execution Mode | Benefits | Limitations |
Process Mode |
|
|
Thread Mode |
|
|
Process Mode: Java Operator sends(receives) batches of data to(from) Python Worker asynchronously on Process Mode, which makes it impossible for Python UDF to run on the same Node of JobGraph with other Java UDFs. So it limits the usage of Python UDF in some scenarios, such as cep, join, etc. In terms of performance, due to inter-process communication, there will be an additional process of serialization/deserialization.
...