Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation & Background

As a computing middleware, linkis provides a unified data computing entrance. linkis is responsible for the connection with the underlying data. In some application scenarios, upper-layer applications need to obtain basic metadata information, such as databases and tables, to perform subsequent operations. Currently, linkis supports metadata query for hive metadata. However, Linkis only supports connection query for the mysql database configured with one hive metadata. For metadata query for multiple databases or non-mysql databases, Linkis cannot support metadata query.

In order to satisfy the query function of metadata information of multiple data sources, linkis is proposed to support the management of necessary information of configuration database connection, and support the query function of metadata information of different types of data sources

Basic concept

● Data source: we will be able to provide data storage of the database service called database, such as mysql/hive/kafka, data source definition is connected to the actual database configuration information, configuration information is mainly connected to the address, user authentication information, connection parameters and so on.

● Metadata: single refers to the metadata of the database, refers to the definition of the data structure of the data, the database of all kinds of object structure of the data. For example, the database name, table name, column name, field length, type and other information data in the database.


Expect to achieve goals

● Able to manage the configuration information of different types of data sources through linkis management console (new/modified/version switch/set expired)

...

● Only do the data source corresponding to the database basic metadata basic information query, does not provide metadata modification and other change functions

Implementation plan

【New】 Datasource management service: linkis-datasource-manager-server Data source management module, ps-data-source-manager. The basic management of the data source, provide external data source new, query, modify, connection test and other http interface. The internal rpc service is provided to facilitate the metadata query module to query the necessary information needed to establish a connection to the database through rpc calls.

...

4. Link tests of data sources are completed through linkis metastore server service, which now provides corresponding metadata query service


Changes

1. Change of code module Module

Code Block
New module linkis-datasource-query-common is added, and new datasource data structure, exception class, and tool class are added

A new module, linkis-datasource-quwey-server, is added to manage data sources. It provides functions such as adding, deleting, checking, modifying, and testing data sources through restful interfaces

Added the linkis-metadata-manager-common module, and added the metadata data structure, exception class, and tool class

The linkis-metadata-manager-server module is added to provide metadata management services and query metadata databases, tables, and columns through restful interfaces

Added a new linkis-metadata-manager-service-es module to provide the elasticsearch metadata management service

The linkis-metadata-manager-service-hive module is added to provide the metadata query service for hive

Add a new module linkis-metadata-manager-service-kafka to provide metadata query service for kafka

A new module linkis-metadata-manager-service-mysql is added to provide metadata query services for mysql

A new datasource management Java client module linkis-datasource-client is added to facilitate datasource management using sdk

2. The http interface is changed

Added the interface for querying metadata

New data source add delete change search function

3. The client interface is changed

LinkisDataSourceRemoteClient interface

...

● MetadataGetColumnsResult getColumns(MetadataGetColumnsAction) Queries the columns of the data table

4. Database table structure adjustment

No table to modify

The new table structure is as follows:

...