Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

FLIP-123 has implemented HiveQL-compatible DDLs so that users can process metadata in HiveQL. This FLIP aims to provide syntax compatibility for queries. Similar as FLIP-123, this FLIP will improve interoperability with Hive and reduce migration efforts. Besides, this FLIP also makes it possible to extend HiveQL to support streaming features. And therefore, users may not only write HiveQL for batch jobs, they may also run streaming jobs in a HiveQL-fashion, which provides better batch-streaming unified experience for a migrating userwith this FLIP, the following typical use cases can be supported:

  1. Users can migrate their batch Hive jobs to Flink, without needing to modify the SQL scripts.
  2. Users can write HiveQL to integrate streaming features with Hive tables, e.g. streaming data from Kafka to Hive.
  3. Users can write HiveQL to process non-Hive tables, either in batch or in streaming jobs.

For migrating users, we believe it's desirable for them to be able to continue write Hive syntax. It not only makes the migration easier, but also helps them leverage Flink for new scenarios more quickly, and thus provides unified batch-streaming experience.

Proposed Changes

The Idea

...

  1. HiveQL syntax is in general backward compatible. So we can use a newer version to support older versions.
  2. The process to generate RelNode plan is tightly coupled with ASTNode and semantic analysis. While it’s theoretically possible to make HiveParserCalcitePlanner support different versions, that’ll make the logic much more complicated and error-prone.
  3. The copied code gives us more flexibility to support new features in the future. For example, we can adapt the code to support writing HiveQL for generic tables, or support querying tables across multiple catalogs.

Go Beyond Hive

To support HiveQL on non-Hive tables, we need to:

  1. Extend Hive syntax, so that it supports identifiers like "catalog.db.table", and streaming features like Group Windows.
  2. Leverage Flink Catalog to retrieve metadata.

Ultimately we'll make this feature as a pure SQL dialect, which is orthogonal to the tables being queried.

New or Changed Public Interfaces

...

The following limitations apply when using this feature.

...

.

  1. HiveModule should be used in order to use Hive built-in functions.
  2. Some features are not supported due to underlying functionalities are missing. For example, Hive’s UNION type is not supported.

Appendix

...