Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Page properties


ReasonSubsumed by the bigger vision described in FLIP-32.


Status

Current state:   "Under DiscussionDiscarded"

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Long-term-goal-of-making-flink-table-Scala-free-td22761.html

...

flink-table [moved out of flink-libraries as a top-level parent module]


  • flink-table-spicommon
    Contains interfaces and common classes that need to be shared across different Flink modules. This module is renamed from `flink-table-common` that was introduced in Flink 1.7 and its name integrates nicely with the existing Flink naming scheme.

    Connectors, formats, and UDFs can use this without depending on the entire Table API stack or Scala. The module contains interface classes such as descriptors, table sink, table source. It will also contain the table factory discovery service such that connectors can discover formats.

    The module should only contain Java classes and should have no external dependencies to other modules. It contains a `flink-core` dependency for common classes such as `TypeInformation`.

    In the future, we might need to add some basic expression representations (for <, >, ==, !=, field references, and literals) in order to push down filter predicates into sources without adding a dependency on `flink-table-api-base` or Calcite.

    Currently, we cannot add interfaces for connectors into the module as classes such as `StreamTableSource` or `BatchTableSource` require a dependency to `DataStream` or `DataSet` API classes. This might change in the future once other modules such as `flink-streaming-java` have been reworked. For now, extension points for connectors are located in `flink-table-api-*` and integrate with the target API.


  • flink-table-api-base
    Contains API classes such as expressions, Calcite configuration, TableConfig and base classes for Table and TableEnvironment. It contains most classes from `org.apache.flink.table.api.*` plus some additional classes. It contains subclasses of `org.apache.flink.table.plan.logical.LogicalNode`.

    This module will be used by language-specific modules. It If at all, it will have only Calcite as external dependency (+ shaded Calcite dependencies) since expressions need to be converted into `RexCall`s and nodes need to be converted into `RelNode`s. However, we should aim to not expose Calcite through the API. Only `flink-table-planner` should require Calcite.

    Additionally, it the module will depend on `flink-table-spi`, `flink-table-runtime` and `flink-table-planner` to run queries in an IDE.
    As a long-term goal, this common`.

    This module should only contain Java classes. But will contain Scala classes until we ported all expressions, expression parser, and nodes. This might take a while.

  • flink-table-api-java
    Contains API classes with interfaces targeted to Java users, i.e. `BatchTableEnvironment` and `StreamTableEnvironment` extending some base class.

    The module should only contain Java classes. It will only dependent on `flink-table-api-base` and `flink-streaming-java`.

  • flink-table-api-scala
    Contains API classes with interfaces targeted to Scala users, i.e. `BatchTableEnvironment` and `StreamTableEnvironment` extending some base class.

    The module should only contain Scala classes. It will only dependent on `flink-table-api-base` and `flink-streaming-scala`.

    There were opinions about letting `flink-table-api-scala` depend on `flink-table-api-java` and removing the base module. However, the problem with this approach is that classes such as `BatchTableEnvironment` or `Tumble` would be twice in the classpath. In the past, this led to confusion because people were not always paying attention to their imports and were mixing Java API with Scala API. The current module structure avoids ambiguity.

  • flink-table-planner
    Contains the main logic for converting a logical representation into DataStream/DataSet program that only relies on `flink-table-runtime`. The planner module bridges `api` and `runtime` module similar to how it is done in the DataSet API of Flink. A user has to add `flink-table-api-scala/java` and `flink-table-planner` in order to execute a program in an IDE.

    This module contains the original `flink-table` module classes. It will gradually be converted into Java and some classes will be distributed to their future location in `flink-table-runtime` or `flink-table-api-*`. This might take a while because it contains a large set of rules, code generation, and translation logic and would be the biggest migration effort.

    For example, code generation is currently using Scala features such as multiline strings and string interpolation. Doing this in Java might not be as convenient. Either we wait until Java has better support (e.g. raw string literals look promising) or (better) we change the code generation to a programmatic approach Expr.if(Expr.assign(var, value), Expr.throw(exception)) or something similar.

    We could make this module pretend to be Scala-free by only loading Scala dependencies into a separate classloader. A dedicated `Planner` class could be the interface between API and planning modules. Such a signature could look similar to:

    DataStream<OUT> translateStream(PlannerContext context, RelNode plan)  
    DataSet<OUT> translateBatch(PlannerContext context, RelNode plan)

    The module will depend on the Calcite as external dependency. Internally, it will also require `flink-streaming-java` and `flink-table-runtime`.

  • flink-table-runtime
    Contains the main logic for executing a table program. It aims to make JAR files that need to be submitted to the cluster small.

    The module will be a mixed Scala/Java project until we converted all classes to Java. However, compared to `flink-table-planner` this should be an achievable goal as runtime classes don’t use a lot of Scala magic and are usually pretty compact.

    The module will depend on the Janino compiler and `flink-streaming-java`.

    Currently, we use some Calcite functions during runtime. Either we have to find alternatives (e.g. for time conversion) or we need to add a Calcite dependency to this module as well.

  • flink-sql-client
    The SQL Client logically belongs to `flink-table` and should be moved under this module.

...

  1. Setup new module structure
    Move all files to their corresponding modules as they are. No migration happens at this stage. Modules might contain both Scala and Java classes. Classes that should be placed in `flink-table-spi` but are in Scala so far remain in `flink-table-api-base` planner` for now.

  2. Migrate UDF classes to `flink-table-spi`common`
    All UDF interfaces have little dependencies to other classes.

  3. Migrate `flink-table-runtime` classes
    All runtime classes have little dependencies to other classes.

  4. Migrate main Table API classes to `flink-table-api-base`
    The most important API classes such as TableEnvironments and Table are exposing a lot of protected methods in Java. Migrating those classes makes the API clean and the implementation ready for a major refactoring for the new catalog support. We can also think about a separation of interface and implementation; e.g. `Table` & `TableImpl`. However, the current API design makes this difficult as we are using constructors of interfaces `new Table(...)`.

  5. Migrate connector classes to `flink-table-spi`api-*`
    Once we implemented improvements to the unified connector interface, we can also migrate the classes. Among others, it requires a refactoring of the timestamp extractors which are the biggest blockers because they transitively depending on expressions.

  6. Migrate remaining `flink-table-spi` common` classes
    While doing tasks for the new external catalog integration or improvements to the unified connector interfaces, we can migrate the remaining classes.

  7. Migrate remaining `flink-table-api-base` classes
    This includes expressions, logical nodes etc.

  8. Load Scala in `flink-table-planner` into a separate classloader
    After this stage, `flink-table` would be Scala-free from a dependency perspective.

  9. Migrate `flink-table-planner` classes
    Final goal of Scala-free `flink-table`.

...