State
[progress record] :
Proposed time : 2022/03/01
Discussion time : 2022/04/01
Acceptance/Rejection time : 2022/04/30
Completion time : 2022/05/21
[issue] :
[email] : At present, you must initiate a discussion in the wechat group [Apache Linkis Community Development group], and the discussion minutes can be sent to the official dev email of linkis
[release] : linkis 1.1.0
[proposer]:
Motivation & Background
At present, Linkis management console does not provide the management entrance of UDF function, and it relies on Scriptis component in DSS to manage UDF function. Therefore, the creation and modification of UDF function need to rely on the external system Scriptis. Second, currently, multiple UDF versions cannot be controlled. Therefore, historical versions of a UDF function cannot be viewed or rolled back. The current UDF function jar packages and script materials are stored on the host of the server. If the server is replaced, the UDF function jar packages and script materials need to be migrated synchronously
Basic concept
Several types of UDF
Code Block |
---|
UDF function (requires registration to use) Universal type of UDF function: refers to the UDF function that can be used by both Hive's hql and Spark's sql, generally compiled into a jar package Spark-type UDF function: Spark-specific UDF, registered through scala or python function, can also be used in sql after registration Custom functions (can be used like normal functions in scripts without registration, PythonUDF and ScalaUDF can only be used in Spark engine) python custom functions: functions written based on python scala custom function: function written based on scala Type ID: UDF_JAR = 0; UDF_PY = 1; UDF_SCALA = 2; FUNCTION_PY = 3; FUNCTION_SCALA = 4; |
Expect to achieve goals
- Ability to add/modify/delete/version control the Udf function of the compiled Jar package through the linkis management console
- Ability to add/modify/delete/version control the written python/scala script custom functions through the linkis console
- The jar package/script corresponding to the udf function can be persistently stored in BML
- Administrator users who can support udf functions can share their personal udf functions with others
- Can support the classification selection or input of new types when creating personal udf functions
Implementation plan
- The linkis_ps_udf_version table needs to be added to support the storage of UDF functions on BML and the storage of multi-version information Relying on the existing BML material management service of linkis, upload the jar/script material of the udf function to the hdfs file system managed by bml.
- When using udf, you need to download from bml to the local directory according to the resourceId and version. The latest version (the maximum version) is used by default.
Things to Consider & Note:
- The udf function name under the same user cannot be repeated, user_name and udf_name are unique
- Need to add the classification field of the udf function
- When modifying udf, you also need to upload the file to bml (resource_id remains unchanged), and add a new version (version+1) to the udf_version table according to the modification information. Modifying the udf function name is not supported.
- For shared functions, after modification, manually click to publish the udf version, the version will take effect for the shared udf user, and the impact of modifying the udf will be controlled. After the verification is correct, the shared user will take effect and use
- If the shared udf is loaded by the user, it cannot be deleted, it can only be expired, and it can be deleted if it is not loaded by the user
- Personal udf can be deleted, but cannot be expired
- Administrative users can share a user's udf with other users, provide an input box to input the user name to be shared,
- If the shared user already has a udf with the same name, the sharing cannot be successful, and the prompt is: 'xx user has a UDF with the same name!
- You can click to roll back to a certain version to add a new version (version+1, other information is the same) by copying the information of the rollback version in the udf_version table. Will use the latest version next time
- UDF handover
- Transfer the UDF to another user: Change the user of the udf to the target user, determine whether the target user has the category of the udf, create a category for the target user if it does not exist, and then update the category id of the udf to the category id of the target user.
- User departure handover is equivalent to handing over all udfs of the user to the new user
Changes
Modification | Detail | |
---|---|---|
1 | Modification of maven module | |
2 | Modification of HTTP interface | UDF added POST /api/rest_j/v1/udf/add UDF modification POST /api/rest_j/v1/udf/update UDF delete POST /api/rest_j/v1/udf/delete/{id} UDF share POST /api/rest_j/v1/udf/shareUDF UDF handover POST /api/rest_j/v1/udf/handover UDF version publish POST /api/rest_j/v1/udf/publish Version rollback POST /api/rest_j/v1/udf/rollback View version list GET /api/rest_j/v1/udf/versionList UDF management page POST /api/rest_j/v1/udf/managerPages (Note: only the udf created by the user can be seen) UDF shared user list POST /api/rest_j/v1/udf/getSharedUsers UDF expires POST /api/rest_j/v1/udf/setExpire UDF view source code POST /api/rest_j/v1/udf/downloadUdf Get a list of udf users GET /api/rest_j/v1/udf/allUdfUsers Get the first-level classification of the user's personal function GET /api/rest_j/v1/udf/userDirectory |
3 | Modification of the client interface | Some new interfaces are added to the UDFClient client class
def getUdfInfosByUdfType(userName: String, category: String, udfType: BigInt): ArrayBuffer[UDFInfoVo]
def getJarUdf(userName: String): ArrayBuffer[UDFInfoVo]
def getPyUdf(userName: String): ArrayBuffer[UDFInfoVo]
def getScalaUdf(userName: String): ArrayBuffer[UDFInfoVo]
def getPyFuncUdf(userName: String): ArrayBuffer[UDFInfoVo]
def getScalaFuncUdf(userName: String): ArrayBuffer[UDFInfoVo] |
4 | Modification of database table structure | |
5 | Modification of configuration item | |
6 | Modification Error code | |
7 | Modifications for Third Party Dependencies |
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
Before the new version goes online, the old udf data needs to be migrated to the new table structure
- If we are changing behavior, how will we phase out the older behavior?
Not compatible at the same time. After the upgrade is completed, all data of the new version will be used, and the old mechanism will be automatically eliminated.
- If we require special migration tools, describe them here.
Provide upgrade reference execution scripts for one-time udf data migration and upgrade
- When will we remove the existing behavior?
One-time upgrade, no need to delete it separately. After the upgrade is verified, you can back up the old table data and clean up the useless data table