Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Improve Result Maintenance

...

The supported materialization mode also depends on the query type:

 

**, *

Query TypeMaterialization Backends

Internal Mode

External Mode*

Batch

collect() -> Heap/Database, External

File table sink

Append Stream

collect() -> Heap/Database, Kafka*, File*

Kafka/file table sink

Retract/Upsert Stream

collect() -> Heap/Database,

(Compacted Kafka)/Cassandra table sink

 

We might use usual heap space at the beginning. The internal database can be any JDBC database. External materialization modes (*) are not included in the first version. In the future, Kafka would be read by general Kafka utility functions. Files as well with support for different file systems.

Result Maintenance

While batch queries have bounded results, streaming queries are potentially never-ending and, therefore, require special treatment for keeping the results up to date and consistent. The streaming query can be considered as a view and the running streaming application as the view maintenance. Results might need to be supplied to systems that were not made for streaming queries, e.g., Java applications that read from a JDBC API. In those cases, every requested result set must be a snapshot (materialized view) of the running query at a point in time. The snapshot must be immutable until all result rows have been consumed by the application or a new result set is requested. We distinguish between two types of results that will require different materialization semantics: a materialized view and a continuously updated result stream.

CREATE MATERIALIZED VIEW [in future versions]

 

  • intended for long running materialization queries that are updated periodically (e.g. , for powering dashboards)

  • this could power JDBC connections

  • checkpointed

  • every hour or on successful checkpoints)

  • retractions are not visible directly, only the materialized result

  • a result can be accessed by JDBC connections or the REST API (e.g. for powering dashboards)

  • materialization operators can cooperate with Flink's checkpointing (e.g. only checkpointed results are exposed through the APIs)

  • a user can specify different parameters for how and how often the view is maintained
    (see create_mv_refresh)

  • it requires another design document for the DDL statement and execution

SELECT

 

  • indended for debugging during query creation and initial show cases

  • retractions are shown as streams of deletion and insertion

  • no guarantees about checkpointed results

  • the executor abstracts the underlying representation and supplies the interfaces for accessing the materialized stream in a FIFO fashion

  • only one running query per CLI session

  • cancelled if cancelled in CLI or CLI is closed


We focus on simple SELECT queries first that are materialized on the heap of the executor.

...

Add the basic features to play around with Flink's streaming SQL. 

  • Add CLI component that reads the configuration files
    • "Pre-registered table sources"
    • "Job parameters"
  • Add executor for retrieving pre-flight information and corresponding CLI SQL parser
    • SHOW TABLES
    • DESCRIBE TABLE
    • EXPLAIN
  • Add streaming append query submission to executor

    • Submit jars and run SELECT query using the ClusterClient

    • Collect results on heap and serve them on the CLI side

    • EXECUTE (for executing a SQL statement stored in a local file)

...