FLIP-216: Decouple Hive connector with Flink planner

Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/66g79w5zlod2ylyv8k065j57pjjmv1jo

Vote thread:

JIRA: [Umbrella] Decouple Hive with Flink planner

Released: 1.16

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

To support Hive dialect with Flink, we have implemented FLIP-123, FLIP-152. But it also brings much maintenance burden and complexity for the Hive connector will depend on flink-table-planner and thus sometimes slows down the devolopement in flink-table-planner. Also, we expect to move out the Hive connector from Flink repository in release-6.0. So, it's necessary to decouple Hive connector with Flink planner but still keep supporting for Hive dialect with Hive connector.

Proposed Changes

The idea

As FLIP-152 described, for hive syntax, it’ll convert the sql to Calcite’s RelNode which is consistent to Hive’s implementation when using CBO in Hive, and then wrap the RelNode to PlannerQueryOperation. So what we really need in Hive connector is just the ability to create RelNode, which invoves accessing the RelOptCluster, RelBuilder, etc, provided by PlannerContext.

So the main idea is to introuce a slim module called flink-table-planner-spi that provides Calcite dependency and exposes limited public interface like #getCluster, #createRelBuilder to enable to create RelNode. Then the Hive connector will only dependend on the slim module.

Introduce flink-table-planner-spi module

1. Move the interface ParserFactory from flink-table-planner to flink-table-planer-spi so that Hive parser can implement ParserFactory

2. Introuce a interface that may be called RelNodeContext for creating RelNode:

// Context for creating RelNode
public interface RelNodeContext {

    CalciteCatalogReader createCatalogReader(
            boolean lenientCaseSensitivity, String currentCatalog, String currentDatabase);

    RelOptCluster getCluster();

    FrameworkConfig createFrameworkConfig();

    RelDataTypeFactory getTypeFactory();

    RelBuilder createRelBuilder(String currentCatalog, String currentDatabase);
}

The interfaces have been implemented in PlannerContext, but we need to expose them to enable others to use.

Then, hive-connector can use RelNodeContext to create RelNode without depending on flink-table-planner at all.

New or Changed Public Interfaces

The RelNodeContext refered above.

Compatibility, Deprecation, and Migration Plan

N/A

Test Plan

It's just refactor work, which can be tested by existing tests.

Other Alternatives

Convert Hive AST to Operation tree. Actually, it's more Flink friendly for the Table API are doing in this way. But it'll take much efforts for we need to rewrite the codebase about Hive dialect totally and may involve creating some new operations. It's a huge work and hard to do it in one shot. As we want to move out hive connector in 1.16, it's more practical to decouple planner first and migrate to operation step by step.

And more discussion about it can be seen in the origin design doc:
https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing

Page tree