You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Status

Current state[Under Discussion"]

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

TableEnvironment has provided two `Table sqlQuery(String sql)` and `void sqlUpdate(String sql)` interfaces to create a table(actually a view here) or describe an update action from one sql string. But with more use cases come up, there are some fatal shortcomings in current API design.

  1. `Table sqlQuery(String sql)` actually returns a temporal view from one select sql and must be registered again before using it in the following sql. According to FLIP-64[3], it’s natural to deprecate `Table sqlQuery(String sql)` and provided a new method `void createTemporalView(String path, String sql)` in TableEnvironment.
  2. Inconsistent execute semantics for `sqlUpdate()`. For now, one ddl passed to this method will be executed immediately while one `insert into` sql actually gets executed when we call `execute()` method, which confuses users a lot.
  3. Don’t support obtain returned value from sql execute. The FLIP-69[1] introduces a lot of common DDLs such as `show tables`, which require TableEnvironment can have an interface to obtain the executed result of one ddl. SQL CLI also has a strong demand for this feature so that we can easily unify the execute way of SQL CLI and TableEnvironemt. Besides, the method name `sqlUpdate` is not consistent with doing things like ‘show tables’.
  4. Unclear and buggy support for buffer sql execution[2]. Blink planner has provided the ability to optimize multiple sql, but we don’t have a clear mechanism through TableEnvironment API to control the whole flow.
  5. Don’t have the ability to deal with multiple statements. It’s a very usual scene that SQL CLI user to execute an outside sql script file. If TableEnvironemnt doesn’t want to support multiple statements, it or sql parser should expose the ability to parse multiple statements to outside, so that SQL CLI can leverage it.
  6. Don’t support execute sql in an asynchronous way. In streaming mode, one `insert into xxx` sql may never end. It’s also possible that one ETL task takes too much time to be done in a batch environment. So it’s very natural and necessary to support execute sql in an asynchronous way. 

Buffering sqls execution problem

  • TableEnvironment shouldn’t buffer sqls/execution plans. For example, now the TableEnvironment support writes code like the following:

tEnv.sqlUpdate("CREATE TABLE test (...) with (path = '/tmp1')");
tEnv.sqlUpdate("INSERT INTO test SELECT ...");
tEnv.sqlUpdate("DROP TABLE test");
tEnv.sqlUpdate("CREATE TABLE test (...) with (path = '/tmp2')");
tEnv.execute()


    1. Users are confused by what kinds of sql are executed at once and what is buffered and what kinds of sql are buffered untie triggered by execute method.
    2. Buffering sqls will cause behavior undefined. We may want to insert into test table with the `/tmp1` path but get the wrong result of `/tmp2`
  • Now the catalog is the only main state management entry point of TableEnvironment and buffering sqls and plans violate this design.

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

A public interface is any change to the following:

  • Binary log format

  • The network protocol and api behavior

  • Any class in the public packages under clientsConfiguration, especially client configuration

    • org/apache/kafka/common/serialization

    • org/apache/kafka/common

    • org/apache/kafka/common/errors

    • org/apache/kafka/clients/producer

    • org/apache/kafka/clients/consumer (eventually, once stable)

  • Monitoring

  • Command line tools and arguments

  • Anything else that will likely break existing users in some way when they upgrade

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Test Plan

Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

  • No labels