...
The supported materialization mode also depends on the query type:
Query TypeMaterialization Backends | Internal Mode | External Mode* | ||
Batch |
| File | *table sink | |
Append Stream |
| Kafka/file table sink | ||
Retract/Upsert Stream |
| (Compacted Kafka) | *,/Cassandra | *table sink |
We might use usual heap space at the beginning. The internal database can be any JDBC database. External materialization modes (*) are not included in the first version. In the future, Kafka would be read by general Kafka utility functions. Files as well with support for different file systems.
Result Maintenance
While batch queries have bounded results, streaming queries are potentially never-ending and, therefore, require special treatment for keeping the results up to date and consistent. The streaming query can be considered as a view and the running streaming application as the view maintenance. Results might need to be supplied to systems that were not made for streaming queries, e.g., Java applications that read from a JDBC API. In those cases, every requested result set must be a snapshot (materialized view) of the running query at a point in time. The snapshot must be immutable until all result rows have been consumed by the application or a new result set is requested. We distinguish between two types of results that will require different materialization semantics: a materialized view and a continuously updated result stream.
CREATE MATERIALIZED VIEW
[in future versions]
intended for long running materialization queries that are updated periodically (e.g. , for powering dashboards)
this could power JDBC connections
checkpointed
every hour or on successful checkpoints)
retractions are not visible directly, only the materialized result
a result can be accessed by JDBC connections or the REST API (e.g. for powering dashboards)
materialization operators can cooperate with Flink's checkpointing (e.g. only checkpointed results are exposed through the APIs)
a user can specify different parameters for how and how often the view is maintained
(see create_mv_refresh)it requires another design document for the DDL statement and execution
SELECT
indended for debugging during query creation and initial show cases
retractions are shown as streams of deletion and insertion
no guarantees about checkpointed results
the executor abstracts the underlying representation and supplies the interfaces for accessing the materialized stream in a FIFO fashion
only one running query per CLI session
cancelled if cancelled in CLI or CLI is closed
We focus on simple SELECT queries first that are materialized on the heap of the executor.
...
Add the basic features to play around with Flink's streaming SQL.
- Add CLI component that reads the configuration files
- "Pre-registered table sources"
- "Job parameters"
- Add executor for retrieving pre-flight information and corresponding CLI SQL parser
SHOW TABLES
DESCRIBE TABLE
EXPLAIN
Add streaming append query submission to executor
Submit jars and run
SELECT
query using the ClusterClientCollect results on heap and serve them on the CLI side
EXECUTE
(for executing a SQL statement stored in a local file)
...