State

[progress record]:

Proposed time: 2022/10/10

Discussion time:

Accept/Reject Time:

Complete time:

[issues]: To be added

[email]: After creating the LKIP and writing the preliminary content, start a discussion on the LKIP proposal. Currently, the discussion must be initiated in the WeChat group [Apache Linkis Community Development Group], and the minutes can be sent to the official dev mailbox of linkis. The email address for the minutes can be placed here

[release]: The (planned) release version of Linkis

[proposer]: peacewong

Motivation & Background

1. Users want to be able to perform fuzzy search by code on Linkis' historical tasks
2. Need to be able to control permissions, ordinary users can only search their own code

Basic concept

  •  The management console supports code fuzzy search, and supports highlighting of matching content

Expect to achieve goals

  • 1. Support fuzzy search of historical code and strictly control permissions
  • 2. Phase 1 only supports retrieval of T-1 codes
  • 3. Supports highlighting of retrieved codes

Implementation plan

  • 1. To realize the task of batch derivative, import yesterday's historical tasks into ES regularly through Exchange every day. It should be noted that retrieval is temporarily not supported for codes exceeding 50,000.
  • 2. Implement back-end search interface through ESClient, only support fuzzy search according to code
  • 3. Support code fuzzy search function through plug-in, which is not enabled by default

Remark: Why only 50000 is supported, because when Linkis stores the code, if it exceeds 50000, the code will be stored in the file system, and only the corresponding file path is stored in the database. There is no corresponding code for batch importing into ES. In addition, it is also to reduce the pressure on ES as much as possible when searching.

Technology Architecture

As shown in the figure below, the technical architecture is mainly divided into two parts:
1. Schedule batch tasks, import the table linkis_ps_job_history_group_history into ES through Exchange (datax), create a linkis index, and create a type according to the cluster
2. Support front-end search, match search by calling ES Client

Changes


Modification Detail
1
Modification of maven module


2Modification of HTTP interface
3Modification of the client interface
4Modification of database table structure
5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependenciesintroduce ES Client

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior, how will we phase out the older behavior?
  • If we require special migration tools, describe them here.
  • When will we remove the existing behavior?