Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: Under Discussion

...

Vote thread: 

JIRA: [Umbrella] Decouple Hive with Flink plannerPluggable dialect and decouple Hive connector

Released: 1.16

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

To support Hive dialect with Flink, we have implemented Currently, with FLIP-123, FLIP-152, we have supported  Hive dialect.  But it also brings much maintenance burden and complexity for the Hive connector will depend on flink-table-planner and thus sometimes slows down the devolopement in flink-table-planner. Also, we expect to move out the Hive connector from Flink repository in release-6.0. So, it's necessary to decouple Hive connector with Flink planner but still keep supporting for Hive dialect with Hive connector. 

Proposed Changes

The idea

's much dependent on Flink planner. Also the interfaces involved is more like to be internal, which is not convenient and available to implement other dialects.

So, this Flip is to Introduce pluggable dialect with some public interfaces to that make convenient to support other dialects.  Also, at the same time, it's intend to slove the legacy problem brought by supporting Hive dialect that the Hive connector is coupled to Flink planner, which brings much complexity andmaintenance burden.

Proposed Changes

The idea

Introuce a slim module may called flink-table-planer-spi  containing the interface ParserFactory. Then, to support new dialect, it's need include the module to and implement the ParserFactory.  All the existing dialects are to follow this way.


ParserFactory

Code Block
languagejava
/**
 * Factory that creates {@link Parser}.
 *
/
@Public
public interface ParserFactory extends Factory {

    /** Creates a new parser. */
    Parser create(Context context);
    
    /** Context provided when a parser is created. */
    interface Context {
        // ...
    }
}


Then, for example,  if you would like to support MySQL dialect, you need to provide an implementation for ParserFactory may called MySQLParserFactory. The factory is responsible for creating the corresponding MySQLParser. The interface Parser has existed in Flink, it look like as follows:

Code Block
languagejava
/** Provides methods for parsing SQL objects from a SQL string. */
public interface Parser {    
    List<Operation> parse(String statement);
}

For the MySQLParser, you should provide the method that converts SQL statement to the  Operaton that Flink expects.

Finally,  specific the class path of MySQLParserFactory in the reosurce file org.apache.flink.table.factories.Factory to make it can be discovered by Java SPI mechanism. 

After that, you can switch to MySQL dialect by setting table.sql-dialect whiling excuting the sql.


Decouple Hive connector

When talk about decopling Hive connector, it's a little of complex for the current implementation is converting As FLIP-152 described, for hive syntax, it’ll convert the sql to Calcite’s RelNode which is consistent to Hive’s implementation when using CBO in Hive, and then wrap the RelNode to PlannerQueryOperation. So what we really need in Hive connector is just . It requires flink-table-planner to creat Calcite RelNode.

The better way is to convert Hive AST to Operation tree, but it'll take much efforts for we need to rewrite the codebase about Hive dialect totally for it's totally different from the current implementation.  It's hard to migrate to Operation tree at one shot.


So the tempory way is to provide the calcite dependency in flink-table-planner-spi module, along with the ability to create RelNode, which invoves accessing the RelOptCluster, RelBuilder, etc, provided by PlannerContext.

So the main idea is to introuce a slim module called flink-table-planner-spi that provides Calcite dependency and exposes limited public interface like #getCluster, #createRelBuilder to enable to create RelNode. Then the Hive connector will only dependend on the slim module.

1. Move the interface ParserFactory from flink-table-planner to flink-table-planer-spi so that Hive parser can implement ParserFactory

  But it's internal and only used by Hive connector.

Anyway, at the end, the calcite dependency should be removed and the Hive dialect should be migrate to Operation tree. 

About the ability to create RelNode, it may need a context to create RelNode, the context looks like as follows2. Introuce a interface that may be called RelNodeContext for creating RelNode:

Code Block
languagejava
// Context for creating RelNode
public interface RelNodeContext {

    CalciteCatalogReader createCatalogReader(
            boolean lenientCaseSensitivity, String currentCatalog, String currentDatabase);

    RelOptCluster getCluster();

    FrameworkConfig createFrameworkConfig();

    RelDataTypeFactory getTypeFactory();

    RelBuilder createRelBuilder(String currentCatalog, String currentDatabase);
}


The interfaces have been implemented in PlannerContext, but we need to expose them to enable others Hive connector to use. 


Then, hive-connector can use RelNodeContext to create RelNode without depending the Hive connector will works without the dependency on flink-table-planner at allin pom.

New or Changed Public Interfaces

...

With decoupling Hive connector, the public interfaces looks like:

Code Block
languagejava
@Internal
public interface ParserFactory extends Factory {

    /** Creates a new parser. */
    Parser create(Context context);

    /** Context provided when a parser is created. */
    interface Context {
        CatalogManager getCatalogManager();

        RelNodeContext getPlannerContext();
    }
}


Compatibility, Deprecation, and Migration Plan

N/A

Test Plan

It's just refactor work, which can be tested by existing tests.

...

Convert Hive AST to Operation tree. Actually, it's more Flink friendly for the Table API are doing in this way. But it'll take much efforts for we need to rewrite the codebase about Hive dialect totally and may involve creating some new operations. It's a huge work and hard to do it in one shot. As we want to move out hive connector in 1.16, it's more practical to decouple planner first and migrate to operation step by step. And more discussion about it can be seen in the origin design doc:
https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing