Page History

...

The supported materialization mode also depends on the query type:

**, *

Query TypeMaterialization Backends	Internal Mode	External Mode*
Batch	`collect()` -> Heap/Database, External	File	table sink
Append Stream	`collect()` -> Heap/Database, Kafka, File	Kafka/file table sink
Retract/Upsert Stream	`collect()` -> Heap/Database,	(Compacted Kafka)	/Cassandra	table sink

We might use usual heap space at the beginning. The internal database can be any JDBC database. External materialization modes (*) are not included in the first version. In the future, Kafka would be read by general Kafka utility functions. Files as well with support for different file systems.

Result Maintenance

While batch queries have bounded results, streaming queries are potentially never-ending and, therefore, require special treatment for keeping the results up to date and consistent. The streaming query can be considered as a view and the running streaming application as the view maintenance. Results might need to be supplied to systems that were not made for streaming queries, e.g., Java applications that read from a JDBC API. In those cases, every requested result set must be a snapshot (materialized view) of the running query at a point in time. The snapshot must be immutable until all result rows have been consumed by the application or a new result set is requested. We distinguish between two types of results that will require different materialization semantics: a materialized view and a continuously updated result stream.

CREATE MATERIALIZED VIEW [in future versions]

intended for long running materialization queries that are updated periodically (e.g. , for powering dashboards)
this could power JDBC connections
checkpointed
every hour or on successful checkpoints)
retractions are not visible directly, only the materialized result
a result can be accessed by JDBC connections or the REST API (e.g. for powering dashboards)
materialization operators can cooperate with Flink's checkpointing (e.g. only checkpointed results are exposed through the APIs)
a user can specify different parameters for how and how often the view is maintained
(see create_mv_refresh)
it requires another design document for the DDL statement and execution

SELECT

indended for debugging during query creation and initial show cases
retractions are shown as streams of deletion and insertion
no guarantees about checkpointed results
the executor abstracts the underlying representation and supplies the interfaces for accessing the materialized stream in a FIFO fashion
only one running query per CLI session
cancelled if cancelled in CLI or CLI is closed

We focus on simple SELECT queries first that are materialized on the heap of the executor.

...

Add the basic features to play around with Flink's streaming SQL.

Add CLI component that reads the configuration files
- "Pre-registered table sources"
- "Job parameters"
Add executor for retrieving pre-flight information and corresponding CLI SQL parser
- ```
SHOW TABLES
```
- ```
DESCRIBE TABLE
```
- ```
EXPLAIN
```
Add streaming append query submission to executor
- Submit jars and run SELECT query using the ClusterClient
- Collect results on heap and serve them on the CLI side
- EXECUTE (for executing a SQL statement stored in a local file)

...

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Result Maintenance