Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Before deep into the DDL SQL, we want to discuss the major requirements for defining a function within Flink runtime by related FLIPs:

  • External lib registration. The requirements come from the hive integration that enhances the adoption of Flink batch. HQL supports syntax like:


CREATE FUNCTION addfunc AS 'com.example.hiveserver2.udf.add' USING JAR 'hdfs:///path/to/jar'


  • Language distinction. Due to bytecode language specifics in Scala, there are some limitations to extract type information from scala function. 

...

           Mysql create function syntax support language in this way: 

CREATE FUNCTION hello (s CHAR(20))RETURNS CHAR(50)DETERMINISTIC RETURN CONCAT('Hello, ',s,'!') LANGUAGE SQL


  • Temporary Function Support. FLIP-57 proposes to distinguish temporary and non-temporary functions for both catalog and system.

           As a temporary function will be registered only for the current session. It requires a flag from DDL to distinguish the function resolution order.

 CREATE TEMPORARY FUNCTION addfunc AS 'com.example.hiveserver2.udf.add' USING JAR 'hdfs:///path/to/jar'


  • Function Qualifier. Functions identifiers resolution consider object scopes whether in particular catalog, database or just current catalog and database.

          Thus, all of the function DDL needs to support 3-part path.

CREATE FUNCTION catalog1.addfunc AS 'com.example.hiveserver2.udf.add' LANGUAGE JVM


Function DDL SQL Syntax

Create Function Statement

...

n this case, the user can use a class that is not the local classpath. In the example above, the function NestedOutput is contained in a jar that is released to artifactory.

Using this type of model, we can split the user level logic from the platform. Each team can write and own its own UDF library. A Flink platform is just responsible to load it into classpath and use it.

We will discuss how to achieve it in the later section. Basically, the resource URL will be added as a user library in the execution environment. It will be added into a job graph,

and ship to the storage layer, such as hdfs before job submission.

...

From an implementation perspective, we want to provide function syntax align with multiple language support, but limit the scope only in java and scala.

The python udd related support will be discussed and implemented in the scope of FLIP-78. The concrete action items include

...


As FLIP-65  New type inference for Table API UDFs is a blocker for adding scala function into TableEnvImpl. 1), 2) and 3) will only support language java. 4) is for adding a function

into the table environment with remote resources.  Once the FLIP-65 is done, we can continue the work of supporting language Scala, and corresponding function registration into TableEnvImpl.

...