State

[progress record]:

Proposed time: 2022/05/05

Discussion time:

Acceptance time:

Complete time:

[issues]:

[email]:

[release]:

[proposer]: 

Motivation & Background

Linkis integrates multiple engines in function, but for front-end application developers, it is still necessary to specify the engine. The SQL or script between the engines is not universal, and the learning cost of the engine is still high; another aspect exists between the engines There must be differences. For example, for the use of hudi tables, the ability of spark engine to integrate hudi tables is greater than that of hive engine, but for users who do not know this, they will still use Hive engine. The stability of the Linkis system is in doubt. In order to enable users to use the Linkis engine more intelligently and conveniently, it is recommended to develop and develop engine routing, and automatically select the appropriate engine to execute statements according to the characteristics of SQL or scripts.

Basic concept

  • Engine routing: Automatically select the appropriate engine according to the characteristics of the execution statement input by the user.

Expect to achieve goals

  • Phase 1: realize the automatic routing function of SPARK\HIVE\PRESTO, commonly used computing engines;
  • Phase 2: Add the automatic routing function of other major engines such as ClickHouse (metadata not stored in HIveMetaStore);

Implementation plan

1. Add a service Engine-Router-Service to Linkis, located under Public-Service;

  • The service provides Rest asynchronous interfaces, namely Commit (submit statement), Status (obtain analysis status), GetResult (obtain the analysis result if successful, including the suggested engine group and the corresponding engine conversion SQL), GetLog (abnormal then get the error log)
  • After the engine-router-service gets the statement, it will put it in the pending queue and wait for the analysis service to process (Analyze Service). First, it will be parsed according to the Hive syntax (the sql transmitted by the front-end unified requirements is based on hsql), and the parsing will integrate the metadata The data obtains the basic engine information contained in the statement, such as the statement execution type, the number of joins, the number of union tables, and the size of the data set.
  • Analyze the engine information according to the registered Rule Listener, and select and sort the engines.
  • Convert the statement into the corresponding engine format, such as hive sql into presto sql, hive-sql into spark sql.

2. Add interception in Linkis Entrance, forward the request without specified engine to Linkis-Router-Service to get the engine list, and execute it according to the priority. If the execution of the engine with high priority is abnormal, the next engine will be executed. Try not to change the original engine execution logic. Architecture diagram:


Things to Consider & Note:

  •  Do you need to consider the compatibility of the original parameter method?

Changes


Modification Detail
1
Modification of maven module


2Modification of HTTP interface
3Modification of the client interface
4Modification of database table structure
5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependencies

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior, how will we phase out the older behavior?
  • If we require special migration tools, describe them here.
  • When will we remove the existing behavior?