Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Result Maintenance section improved

...

While batch queries have bounded results, streaming queries are potentially never-ending and, therefore, require special treatment for keeping the results up to date and consistent. The streaming query can be considered as a view and the running streaming application as the view maintenance. Results might need to be supplied to systems that were not made for streaming queries, e.g., Java applications that read from a JDBC API. In those cases, every requested result set must be a snapshot (materialized view) of the running query at a point in time. The snapshot must be immutable until all result rows have been consumed by the application or a new result set is requested. We distinguish between two types of results that will require different materialization semantics: a materialized view and a continuously updated result stream.materialized result stream.

Materialized View

A consistent materialized view of results for production use cases. Materialized views are not part of this FLIP but might be added in future versions. It requires another design document for the DDL statement and execution but here are some properties we aim for:

SQL: CREATE MATERIALIZED VIEW ...CREATE MATERIALIZED VIEW [in future versions]

  • intended for long running materialization queries that are updated periodically (e.g. every hour or on successful checkpoints)

  • retractions are not visible directly, only the materialized result

  • a result can be accessed by JDBC connections or the REST API (e.g. for powering dashboards)

  • materialization operators can cooperate with Flink's checkpointing (e.g. only checkpointed results are exposed through the APIs)

  • a user can specify different parameters for how and how often the view is maintained
    (see create_mv_refresh)

  • it requires another design document for the DDL statement and execution

  • runs detached from the CLI client

Materialized Result Stream

A materialized stream of results for getting immediate insights into the running SQL query.

SQL: SELECT ...SELECT

  • indended for debugging during query creation and initial show cases

  • retractions are shown as streams of deletion and insertion

  • no guarantees about checkpointed results

  • the executor abstracts the underlying representation and supplies the interfaces for accessing the materialized stream in a FIFO fashion

  • only one running query per CLI session

  • cancelled if cancelled in CLI or CLI is closed

...

We focus on simple SELECT queries first that are materialized on the heap of the executor (internal materialization mode).

Compatibility, Deprecation, and Migration Plan

...

  • Add CLI component that reads the configuration files
    • "Pre-registered table sources"
    • "Job parameters"
  • Add executor for retrieving pre-flight information and corresponding CLI SQL parser
    • SHOW TABLES
    • DESCRIBE TABLE
    • EXPLAIN
  • Add streaming append query submission to executor

    • Submit jars and run SELECT query using the ClusterClient

    • Collect results on heap and serve them on the CLI side (Internal Mode with SELECT)

    • EXECUTE (for executing a SQL statement stored in a local file)

...