Motivation

The current JDBC connector unsupport reading multiple tables, but user cases indicate that this feature is needed. Missing this feature leads to the following problems:

  • a table requires single read task, and management and maintenance are very complex.
  • resource waste

The purpose of this project is to solve these two main problems.

Public Interfaces

Rely on the public interface provided by Catalog API to publish metadata of multiple tables.

Proposed Changes

Single table reading supports table_path option replacement/enhancement query

Single table reading supports table_path option replacement/enhancement query

Added table_path optional option, used to specify the table path to be read, and use table_path to automatically generate query.

Of course, the query parameter is still retained. On the one hand, it is compatible with the use of older versions, and on the other hand, the query can be customized in special situations such as complex query scenarios.

The combination of table_path and query is as follows:

  • Only configure query: keep the current logic unchanged, automatically generate table_path to null.null.default
  • Only configure table_path: automatically generate query as select * from xxx, automatically obtain primary key/unique key as partition_column
  • Configure table_path and query: check the query is a complex sql. If it is not a complex sql, use table_path to get metadata (primary key/unique key/partition_column)

Before the STIP:

source{
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"

        query = "select * from test-1"
        partition_column = "id"
    }
}

After the STIP:

source{
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "******"

        table_path = 'test.test-1'
    }
}

Supports multi-table configuration reading

Add the configuration parameter table-list, the configuration structure is list, in which each element is the configuration of a table, and the configuration items of each table are consistent with those of a single table.

example

source{
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "******"

        table-list = [
            {
                table_path = 'test-1'

                query = "select * from test-1"
                partition_column = "id"
            },
            {
                table_path = 'test-2'

                query = "select * from test-2"
                partition_column = "id"
            }
        ]
    }
}

JDBC multiple database adaptation

Since JDBC internally contains processing of multiple databases, table_path parsing needs to be adapted by the database dialect.

Compatibility, Deprecation, and Migration Plan

Compatibility

  • It does not affect single table reading and remains compatible with users of older versions.

Deprecation

Migration Plan

  • Compatibility upgrade, users can upgrade as needed

Test Plan

Test coverage scenarios

  • Connector:JDBC single table reading
  • Connector:JDBC multiple table reading

Rejected Alternatives

Risk

Missing metadata in complex query scenarios

If you use complex query SQL, such as Join/Union, the primary key information may be missing when Sink automatically creates tables.

  • No labels