State

[progress record] :

Proposed time : 2022/03/01

Discussion time : 2022/04/01

Acceptance/Rejection time : 2022/04/30

Completion time : 2022/05/21

[issue] : 

[email] : At present, you must initiate a discussion in the wechat group [Apache Linkis Community Development group], and the discussion minutes can be sent to the official dev email of linkis

[release] : linkis 1.1.0

[proposer]: 

Motivation & Background

At present, Linkis management console does not provide the management entrance of UDF function, and it relies on Scriptis component in DSS to manage UDF function. Therefore, the creation and modification of UDF function need to rely on the external system Scriptis. Second, currently, multiple UDF versions cannot be controlled. Therefore, historical versions of a UDF function cannot be viewed or rolled back. The current UDF function jar packages and script materials are stored on the host of the server. If the server is replaced, the UDF function jar packages and script materials need to be migrated synchronously

Basic concept

Several types of UDF

UDF function (requires registration to use)
Universal type of UDF function: refers to the UDF function that can be used by both Hive's hql and Spark's sql, generally compiled into a jar package
Spark-type UDF function: Spark-specific UDF, registered through scala or python function, can also be used in sql after registration

Custom functions (can be used like normal functions in scripts without registration, PythonUDF and ScalaUDF can only be used in Spark engine)
python custom functions: functions written based on python
scala custom function: function written based on scala
Type ID:
UDF_JAR = 0;
UDF_PY = 1;
UDF_SCALA = 2;
FUNCTION_PY = 3;
FUNCTION_SCALA = 4;

Expect to achieve goals

  • Ability to add/modify/delete/version control the Udf function of the compiled Jar package through the linkis management console
  • Ability to add/modify/delete/version control the written python/scala script custom functions through the linkis console
  • The jar package/script corresponding to the udf function can be persistently stored in BML
  • Administrator users who can support udf functions can share their personal udf functions with others
  • Can support the classification selection or input of new types when creating personal udf functions

Implementation plan

  • The linkis_ps_udf_version table needs to be added to support the storage of UDF functions on BML and the storage of multi-version information Relying on the existing BML material management service of linkis, upload the jar/script material of the udf function to the hdfs file system managed by bml.
  • When using udf, you need to download from bml to the local directory according to the resourceId and version. The latest version (the maximum version) is used by default.


Things to Consider & Note:

  • The udf function name under the same user cannot be repeated, user_name and udf_name are unique
  • Need to add the classification field of the udf function
  • When modifying udf, you also need to upload the file to bml (resource_id remains unchanged), and add a new version (version+1) to the udf_version table according to the modification information. Modifying the udf function name is not supported.
  • For shared functions, after modification, manually click to publish the udf version, the version will take effect for the shared udf user, and the impact of modifying the udf will be controlled. After the verification is correct, the shared user will take effect and use
  • If the shared udf is loaded by the user, it cannot be deleted, it can only be expired, and it can be deleted if it is not loaded by the user
  • Personal udf can be deleted, but cannot be expired
  • Administrative users can share a user's udf with other users, provide an input box to input the user name to be shared,
  • If the shared user already has a udf with the same name, the sharing cannot be successful, and the prompt is: 'xx user has a UDF with the same name!
  • You can click to roll back to a certain version to add a new version (version+1, other information is the same) by copying the information of the rollback version in the udf_version table. Will use the latest version next time
  • UDF handover
  • Transfer the UDF to another user: Change the user of the udf to the target user, determine whether the target user has the category of the udf, create a category for the target user if it does not exist, and then update the category id of the udf to the category id of the target user.
  • User departure handover is equivalent to handing over all udfs of the user to the new user

Changes


Modification Detail
1
Modification of maven module


2Modification of HTTP interface
UDF added POST /api/rest_j/v1/udf/add
UDF modification POST /api/rest_j/v1/udf/update
UDF delete POST /api/rest_j/v1/udf/delete/{id}
UDF share POST /api/rest_j/v1/udf/shareUDF
UDF handover POST /api/rest_j/v1/udf/handover
UDF version publish POST /api/rest_j/v1/udf/publish
Version rollback POST /api/rest_j/v1/udf/rollback
View version list GET /api/rest_j/v1/udf/versionList
UDF management page POST /api/rest_j/v1/udf/managerPages (Note: only the udf created by the user can be seen)
UDF shared user list POST /api/rest_j/v1/udf/getSharedUsers
UDF expires POST /api/rest_j/v1/udf/setExpire
UDF view source code POST /api/rest_j/v1/udf/downloadUdf
Get a list of udf users GET /api/rest_j/v1/udf/allUdfUsers
Get the first-level classification of the user's personal function GET /api/rest_j/v1/udf/userDirectory
3Modification of the client interface

Some new interfaces are added to the UDFClient client class

  • Query udf information by username category and udf type

def getUdfInfosByUdfType(userName: String, category: String, udfType: BigInt): ArrayBuffer[UDFInfoVo]

  • Query the list of udf functions based on jar package by user name

def getJarUdf(userName: String): ArrayBuffer[UDFInfoVo]

  • Query list of udf functions based on py script by username

def getPyUdf(userName: String): ArrayBuffer[UDFInfoVo]

  • Query a list of udf functions based on scala script by username

def getScalaUdf(userName: String): ArrayBuffer[UDFInfoVo]

  • Query a list of custom functions based on py script by username

def getPyFuncUdf(userName: String): ArrayBuffer[UDFInfoVo]

  • Query a list of custom functions based on scala script by username 

def getScalaFuncUdf(userName: String): ArrayBuffer[UDFInfoVo]

4Modification of database table structure

5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependencies

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?

Before the new version goes online, the old udf data needs to be migrated to the new table structure

  • If we are changing behavior, how will we phase out the older behavior?

 Not compatible at the same time. After the upgrade is completed, all data of the new version will be used, and the old mechanism will be automatically eliminated.

  • If we require special migration tools, describe them here.

Provide upgrade reference execution scripts for one-time udf data migration and upgrade

  • When will we remove the existing behavior?

One-time upgrade, no need to delete it separately. After the upgrade is verified, you can back up the old table data and clean up the useless data table