State

[progress record] :

Proposed time : 2022/03/01

Discussion time : 2022/04/01 (The preliminary discussion of the proposal in the wechat group developed by the community)

Acceptance/Rejection time : 2022/04/30

Completion time : 2022/05/21

[issue] : 

[email] : At present, you must initiate a discussion in the wechat group [Apache Linkis Community Development group], and the discussion minutes can be sent to the official dev email of linkis

[release] : linkis 1.1.0  

[proposer]: 

Motivation & Background

As a computing middleware, linkis provides a unified data computing entrance. linkis is responsible for the connection with the underlying data. In some application scenarios, upper-layer applications need to obtain basic metadata information, such as databases and tables, to perform subsequent operations. Currently, linkis supports metadata query for hive metadata. However, Linkis only supports connection query for the mysql database configured with one hive metadata. For metadata query for multiple databases or non-mysql databases, Linkis cannot support metadata query.

In order to satisfy the query function of metadata information of multiple data sources, linkis is proposed to support the management of necessary information of configuration database connection, and support the query function of metadata information of different types of data sources

Basic concept

● Data source: we will be able to provide data storage of the database service called database, such as mysql/hive/kafka, data source definition is connected to the actual database configuration information, configuration information is mainly connected to the address, user authentication information, connection parameters and so on.

● Metadata: single refers to the metadata of the database, refers to the definition of the data structure of the data, the database of all kinds of object structure of the data. For example, the database name, table name, column name, field length, type and other information data in the database.

Expect to achieve goals

● Able to manage the configuration information of different types of data sources through linkis management console (new/modified/version switch/set expired)

● Able to version control and test basic connectivity of data source configuration information through linkis management console

● Provides an HTTP interface to query the basic metadata information of the data source based on the data source identifier and other parameters

● It can provide JAVA-SDK to query the basic metadata information of the data source through the data source identifier and other parameters

● Only do the data source corresponding to the database basic metadata basic information query, does not provide metadata modification and other change functions

Implementation plan

【New】 Datasource management service: linkis-datasource-manager-server Data source management module, ps-data-source-manager. The basic management of the data source, provide external data source new, query, modify, connection test and other http interface. The internal rpc service is provided to facilitate the metadata query module to query the necessary information needed to establish a connection to the database through rpc calls.

【New】 Metadata query service :linkis-metedata-query-server metadata query service, service name ps-metadata-query. It provides the basic query function of database metadata, external http interface, internal rpc service, convenient data source management service, through rpc call: data source connection test (linkis-metadata-query-server to be modified, In version 1.1.0, the name linkis-metedata-manager-server is not appropriate).

1. The Service is registered in the Linkis-eureak-Service Service and managed in a unified way with other Linkis microservices. The client can obtain the data source management service by connecting the linkis-gateway-service service and the service name data-source-manager.

2. The interface layer provides other applications with the addition, deletion, checking and modification of data source and data source environment, connection test of data source, version management of data source and expiration operation through Restful interface.

3. Service layer, mainly for database and material warehouse service management, permanent retention of data source related information;

4. Link tests of data sources are completed through linkis metastore server service, which now provides corresponding metadata query service

Changes


Modification Detail
1
Modification of maven module

  • New module linkis-datasource-query-common is added, and new datasource data structure, exception class, and tool class are added
  • A new module, linkis-datasource-quwey-server, is added to manage data sources. It provides functions such as adding, deleting, checking, modifying, and testing data sources through restful interfaces
  • Added the linkis-metadata-manager-common module, and added the metadata data structure, exception class, and tool class
  • The linkis-metadata-manager-server module is added to provide metadata management services and query metadata databases, tables, and columns through restful interfaces
  • Added a new linkis-metadata-manager-service-es module to provide the elasticsearch metadata management service
  • The linkis-metadata-manager-service-hive module is added to provide the metadata query service for hive
  • Add a new module linkis-metadata-manager-service-kafka to provide metadata query service for kafka
  • A new module linkis-metadata-manager-service-mysql is added to provide metadata query services for mysql
  • A new datasource management Java client module linkis-datasource-client is added to facilitate datasource management using sdk
2Modification of HTTP interface
  • Added the interface for querying metadata d
  • New data source add delete change search function
3Modification of the client interface

LinkisDataSourceRemoteClient interface

  • GetAllDataSourceTypesResult getAllDataSourceTypes (GetAllDataSourceTypesAction) query all data types
  • QueryDataSourceEnvResult queryDataSourceEnv(QueryDataSourceEnvAction) Queries the cluster configurations that can be used by the data source
  • GetInfoByDataSourceIdResult getInfoByDataSourceId (GetInfoByDataSourceIdAction) : through the data source id query data source information
  • QueryDataSourceResult QueryDataSourceAction (QueryDataSourceAction) Queries data sources
  • GetConnectParamsByDataSourceIdResult getConnectParams (GetConnectParamsByDataSourceIdAction) get connection configuration parameters
  • CreateDataSourceResult createDataSource(CreateDataSourceAction) Creates a data source
  • DataSourceTestConnectResult getDataSourceTestConnect (DataSourceTestConnectAction) to test whether or not the data source connection is established properly
  • DeleteDataSourceResult deleteDataSource(DeleteDataSourceAction) Deletes a data source
  • ExpireDataSourceResult expireDataSource(ExpireDataSourceAction) Sets the data source to the expired state
  • GetDataSourceVersionsResult getDataSourceVersions (GetDataSourceVersionsAction) query list data source configuration version
  • PublishDataSourceVersionResult publishDataSourceVersion (PublishDataSourceVersionAction) released data source configuration version
  • UpdateDataSourceResult UpdateDataSourceAction (UpdateDataSourceAction) Updates data sources
  • UpdateDataSourceParameterResult updateDataSourceParameter (UpdateDataSourceParameterAction) to update the data source configuration parameters
  • GetKeyTypeDatasourceResult getKeyDefinitionsByType (GetKeyTypeDatasourceAction) data source type of a query need configuration properties

LinkisMetaDataRemoteClient interface

  • MetadataGetDatabasesResult getDatabases (MetadataGetDatabasesAction) query the database list
  • MetadataGetTablesResult getTables(MetadataGetTablesAction) Queries table data
  • MetadataGetTablePropsResult getTableProps (MetadataGetTablePropsAction)
  • MetadataGetPartitionsResult getPartitions (MetadataGetPartitionsAction) query partition table
  • MetadataGetColumnsResult getColumns(MetadataGetColumnsAction) Queries the columns of the data table
4Modification of database table structure
  • No table to modify
  • The new table structure is as follows:

5Modification of configuration item
6Modification Error code 
7Modifications for Third Party Dependencies

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior, how will we phase out the older behavior?
  • If we require special migration tools, describe them here.
  • When will we remove the existing behavior?