...
While batch queries have bounded results, streaming queries are potentially never-ending and, therefore, require special treatment for keeping the results up to date and consistent. The streaming query can be considered as a view and the running streaming application as the view maintenance. Results might need to be supplied to systems that were not made for streaming queries, e.g., Java applications that read from a JDBC API. In those cases, every requested result set must be a snapshot (materialized view) of the running query at a point in time. The snapshot must be immutable until all result rows have been consumed by the application or a new result set is requested. We distinguish between two types of results that will require different materialization semantics: a materialized view and a continuously updated result stream.materialized result stream.
Materialized View
A consistent materialized view of results for production use cases. Materialized views are not part of this FLIP but might be added in future versions. It requires another design document for the DDL statement and execution but here are some properties we aim for:
SQL: CREATE MATERIALIZED VIEW ...
CREATE MATERIALIZED VIEW
[in future versions]
intended for long running materialization queries that are updated periodically (e.g. every hour or on successful checkpoints)
retractions are not visible directly, only the materialized result
a result can be accessed by JDBC connections or the REST API (e.g. for powering dashboards)
materialization operators can cooperate with Flink's checkpointing (e.g. only checkpointed results are exposed through the APIs)
a user can specify different parameters for how and how often the view is maintained
(see create_mv_refresh)it requires another design document for the DDL statement and execution
runs detached from the CLI client
Materialized Result Stream
A materialized stream of results for getting immediate insights into the running SQL query.
SQL: SELECT ...
SELECT
indended for debugging during query creation and initial show cases
retractions are shown as streams of deletion and insertion
no guarantees about checkpointed results
the executor abstracts the underlying representation and supplies the interfaces for accessing the materialized stream in a FIFO fashion
only one running query per CLI session
cancelled if cancelled in CLI or CLI is closed
...
We focus on simple SELECT queries first that are materialized on the heap of the executor (internal materialization mode).
Compatibility, Deprecation, and Migration Plan
...
- Add CLI component that reads the configuration files
- "Pre-registered table sources"
- "Job parameters"
- Add executor for retrieving pre-flight information and corresponding CLI SQL parser
SHOW TABLES
DESCRIBE TABLE
EXPLAIN
Add streaming append query submission to executor
Submit jars and run
SELECT
query using the ClusterClientCollect results on heap and serve them on the CLI side (Internal Mode with SELECT)
EXECUTE
(for executing a SQL statement stored in a local file)
...