Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Authors: Jincheng Sun, Dian Fu, Aljoscha Krettek

Status

Current state:   "Under Discussion"

Discussion threadhttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html

...

For Python Table API jobs, if an operator contains Python use-defined function, it will be given a resource which is the origin resource + the resource used by the Python process.

Compatibility, Deprecation, and Migration Plan

  • This FLIP is a new feature and so there is no compatible issue with previous versions.

Implementation Plan

  1. Support the basic functionality of Python ScalarFunction
  2. Support chaining Python ScalarFunctions
  3. Python Execution Environment Management. For example, multiple operators can reuse the same Python SDK Harness.
  4. Python Dependency Management. The Python UDF may depend on third party dependencies, we should provide a proper way to handle it.
  5. Add a series of Java and Python Coders for all kinds of data types supported. The data encoded with Java coder should be able to decode with the corresponding Python coder, vice verse.
  6. Add cython support for udf execution. 
  7. Add validation check for places where Python ScalarFunction cannot be used 
  8. Support to use decorator syntax for Python functions
  9. Support the basic functionality of Python TableFunction
  10. Add rules to push down the Python ScalarFunctions contained in the join condition of Correlate node
  11. Add Python Correlate nodes merge rule
  12. Support the basic functionality of Python AggregateFunction without DataView support
  13. Add validation check for places where Python AggregateFunction could not be used
  14. Add ListView support
  15. Add MapView support
  16. Add user-defined metrics support
  17. Add documentation for Python user-defined functions