Status
Current state: "Under Discussion"
...
Page properties | |||
---|---|---|---|
|
...
...
|
...
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
With the efforts in FLIP-24 and FLIP-91, Flink SQL client supports submitting queries SQL jobs but lacks further support for their lifecycles afterward which is crucial for streaming use cases. That means Flink SQL client users have to turn to other clients (e.g. CLI) or APIs (e.g. REST API) to manage the queriesjobs, like triggering savepoints or canceling queries, which makes the user experience of SQL client incomplete.
Therefore, this proposal aims to complete the capability of SQL client by adding query lifecycle statements. With these statements, users could manage queries SQL jobs and savepoints through pure SQL in SQL client.
Public Interfaces
- New Flink SQL Statements
Proposed Changes
Architecture Overview
The overall architecture of Flink SQL client/gateway would be as follow:
...
Most parts are remained unchanged, only SQL Parser and Planner need to be modified to support new statements, and a new component ClusterClientFactory is introduced in Executor to enable direct access to Flink clusters.
...
SQL Job Lifecycle Statements
Query SQL job lifecycle statements mainly interact with deployments (clusters and jobs) and have few connections with Table/SQL concepts, thus it’d be better to keep them SQL-client-only like jar statements.
...
- The keyword for Flink SQL jobs is under discussion. The alternatives are QUERIES/JOBS/TASKS at the moment. For simplicity, we use QUERY/QUERIES as the keyword in the FLIP, and we would determine the final keyword after discussion.was `QUERY`, and now is updated as `JOB`.
- All the <jobAll the <query_id> and <savepoint_path> should be string literals (wrapped in single quotes), otherwise it's hard to parse them.
SHOW RUNNING FLINK SQL JOBS
This statement lists the queries in the Flink cluster, which is similar to flink list in CLI.
Code Block | ||||
---|---|---|---|---|
| ||||
SHOW QUERIESJOBS |
The result contains four columns: queryjob_id (namely Flink job idID), queryjob_name (namely job name), status, and a link start/end time, duration, and a link to the job's web UI address.
Code Block | ||||
---|---|---|---|---|
| ||||
+----------------------------------+-------------+----------+----------------------+----------------------+--------------+----------------------+ | queryjob_id | queryjob_name | status | start_time | end_time | duration duration | web_ururl | +----------------------------------+-------------+----------|----------------------|----------------------|--------------|----------------------| | cca7bc1061d61cf15238e92312c2fc20 | query1 | RUNNING | 2022-05-01 10:20:33 | 2022-05-01 20:45:35 | 10h 10h 25m 42s2s | http://127.0.0.1:8081| | 0f6413c33757fbe0277897dd94485f04 | query2 | FAILED | 2022-05-01 14:04:24 | 2022-05-01 19:09:47 | 5h 5m 23s | http://127.0.0.1:8081| +----------------------------------+-------------+----------+----------------------+--------------+--------+--------------+----------------------+ |
STOP A RUNNING FLINK SQL JOB
This statement stops a non-terminated SQL, which is similar to `flink stop` and `flink cancel` in CLI. As stop command has a `--drain` option, we should introduce a table config like `sql-client.stop-with-drain` to support the same functionality.
Code Block | ||||
---|---|---|---|---|
| ||||
DROPSTOP QUERYJOB '<query<job_id>' [WITH SAVEPOINT] [WITH DRAIN] |
The result would the savepoint path.
Code Block | ||||
---|---|---|---|---|
| ||||
+-----------------------------------------------------------------| | savepoint_path | +-----------------------------------------------------------------| | hdfs://mycluster/flink-savepoints/savepoint-cca7bc-bb1e257f0dab | +-----------------------------------------------------------------| |
CANCEL A RUNNING FLINK SQL JOB
There're two related options to control the fine-grained behavior:
1. WITH SAVEPOINT
If specified, the stop statement stops a SQL job with a savepointThis statement cancels a non-terminated query, which is similar to `flink cancel` stop` in CLI. A PURGE keyword is introduced to represent "without savepoint".
Code Block | ||||
---|---|---|---|---|
| ||||
DROP QUERY '<query_id>' PURGE |
.
Otherwise, the stop statement stops a SQL job ungracefully, just like `flink cancel` In CLI. Since an ungrateful drop doesn’t trigger a savepoint, the result would be a Since an ungrateful drop doesn’t trigger a savepoint, the result would be a simple OK, like the one returned by DDL.
CREATE A SAVEPOINT
2. WITH DRAIN
If specified, the stop statement stops a SQL job and increases the watermark to MAX_WATERMARK to trigger all the timersThis statement triggers savepoints for the specified query, which is similar to `flink savepoint` stop .. --drain` in CLI.
CREATE A SAVEPOINT
This statement triggers savepoints for the specified SQL job, which is similar to `flink savepoint` in CLIWe could follow the savepoint syntax in SQL standard, which is widely used in a transaction block.
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE SAVEPOINT FOR JOB'<query<job_id>' |
The result would the savepoint path.
Code Block | ||||
---|---|---|---|---|
| ||||
+------------------------------------------------------------------| | savepoint_path | +------------------------------------------------------------------| | hdfs://mycluster/flink-savepoints/savepoint-cca7bc-bb1e257f0dab | +------------------------------------------------------------------| |
...
SHOW SAVEPOINTS
This statement deletes the specified savepoint, which is similar to `flink savepoint –dispose` in CLIshows all savepoints in a best-effort manner (since the savepoints are managed by users and outlive Flink clusters, the job manager may not know about all savepoints).
Code Block | |||||
---|---|---|---|---|---|
| |||||
SHOW SAVEPOINTSRELEASE SAVEPOINT '<savepoint_path>' |
The result would be a simple OKsavepoint paths.
COMPLETE USAGE EXAMPLE
Code Block |
---|
+------------------------------------------------------------------|
| savepoint_path |
+------------------------------------------------------------------|
| hdfs://mycluster/flink-savepoints/savepoint-cca7bc-bb1e257f0dab |
+------------------------------------------------------------------|
| hdfs://mycluster/flink-savepoints/savepoint-ca62ea-ce73f92adba2 |
+------------------------------------------------------------------| |
DROP A SAVEPOINT
This statement deletes the specified savepoint, which is similar to `flink savepoint –dispose` in CLI.
Code Block | ||||
---|---|---|---|---|
| ||||
DROP SAVEPOINT '<savepoint_path>' |
The result would be a simple OK.
COMPLETE USAGE EXAMPLE
Code Block | ||
---|---|---|
| ||
Code Block | ||
| ||
Flink SQL> INSERT INTO tbl_a SELECT * FROM tbl_b; [INFO] Submitting SQL update statement to the cluster... [INFO] SQL update statement has been successfully submitted to the cluster: Job ID: 6b1af540c0c0bb3fcfcad50ac037c862 Flink SQL> SHOW QUERIES; +JOBS; +----------------------------------+--------------------+---------+------------+---------+---------------------+-------------+----------------------+ | query job_id | queryjob_name | status | start_time | end_time | duration | web_ur url | + | +----------------------------------+--------------------+---------|-----------+----------|---------------------|-------------|----------------------| | 6b1af540c0c0bb3fcfcad50ac037c862 | INSERT INTO tbl_a..| RUNNING | 2022-05-01 10:20:33 | 0h 2022-05-01 10:20:53 | 0h 0m 20s | http://127.0.0.1:8081| +----------------------------------+--------------------+---------+---------------------+---------------------+-------------+----------------------+ Flink SQL > SAVEPOINT > CREATE SAVEPOINT FOR JOB '6b1af540c0c0bb3fcfcad50ac037c862'; +------------------------------------------------------------------| | savepoint_path | +------------------------------------------------------------------| | hdfs://mycluster/flink-savepoints/savepoint-cca7bc-bb1e257f0dab | +------------------------------------------------------------------| Flink SQL > STOP JOB '6b1af540c0c0bb3fcfcad50ac037c862'; [INFO] The specified job is stopped. Flink SQL > DROP SAVEPOINT 'hdfs://mycluster/flink-savepoints/savepoint------| Flink SQL > DROP QUERY '6b1af540c0c0bb3fcfcad50ac037c862' PURGE; [INFO] The specified query is dropped without savepoint. Flink SQL > RELEASE SAVEPOINT 'hdfs://mycluster/flink-savepoints/savepoint-cca7bc-bb1e257f0dab'; [INFO] The specified savepoint is removed. |
SQL Parser & Planner
To support the new statements, we need to introduce new SQL operators for SQL parser and new SQL operations for the planner.
...
SQL operator
...
SQL operation
...
SqlShowQueries
...
ShowQueriesOperation
...
SqlDropQuery
...
DropQueryOperation
...
SqlDroplQueryPurge
...
DropQueryPurgeOperation
...
SqlSavepoint
...
SavepointOperation
...
SqlReleaseSavepoint
...
ReleaseSavepointOperation
Executor
Executor would need to convert the query lifecycle operations into ClusterClient commands.
...
SQL operation
...
Cluster Client Command
...
ShowQueriesOperation
...
ClusterClient#listJobs
...
DropQueryOperation
...
ClusterClient#stoplWithSavepoint
...
DropQueryPurgeOperation
...
ClusterClient#cancel
...
SavepointOperation
...
ClusterClient#triggerSavepoint
...
SqlReleaseSavepoint
...
ClusterClient#disposeSavepoint
In addition, to interact with the clusters, Executor should be able to create ClusterClient through ClusterClientFactory, thus a ClusterClientServiceLoader would be added to Executor.
Implementation Plan
The implementation plan would be simple:
- Support the new statements and operations in SQL parser and Planner.
- Extend Executor to support the new operations.
Compatibility, Deprecation, and Migration Plan
This FLIP introduces new SQL keywords, which may cause troubles for the existing SQLs. Users need to escape the new keywords if they use them as SQL identifiers.
The new keywords are:
- QUERY (new)
- QUERIES (new)
- RELEASE (new)
- SAVEPOINT (already reserved)
Rejected Alternatives
An alternative approach to query monitoring is that the SQL client or gateway book keeps every query and is responsible for updating the query status through polling or callbacks. In that way, the query status is better maintained, and we wouldn’t lose track of the queries in cases that they’re cleaned up by the cluster or the cluster is unavailable.
However, there’re 2 major concerns:
...
cca7bc-bb1e257f0dab';
[INFO] The specified savepoint is dropped. |
SQL Parser & Planner
To support the new statements, we need to introduce new SQL operators for SQL parser and new SQL operations for the planner.
SQL operator | SQL operation |
SqlShowJobs | ShowJobsOperation |
SqlStopQuery | StopJobOperation |
SqlShowSavepoints | ShowSavepointsOperation |
SqlCreateSavepoint | CreateSavepointOperation |
SqlDropSavepoint | DropSavepointOperation |
Executor
Executor would need to convert the query lifecycle operations into ClusterClient commands.
SQL operation | Cluster Client Command |
ShowJobsOperation | ClusterClient#listJobs |
StopJobOperation | ClusterClient#stopWithSavepoint | ClusterClient#cancel |
ShowSavepointOperation | ClusterClient |
CreateSavepointOperation | ClusterClient#triggerSavepoint |
DropSavepointOperation | ClusterClient#disposeSavepoint |
In addition, to interact with the clusters, Executor should be able to create ClusterClient through ClusterClientFactory, thus a ClusterClientServiceLoader would be added to Executor.
Implementation Plan
The implementation plan would be simple:
- Support the new statements and operations in SQL parser and Planner.
- Extend Executor to support the new operations.
Compatibility, Deprecation, and Migration Plan
This FLIP introduces new SQL keywords, which may cause troubles for the existing SQLs. Users need to escape the new keywords if they use them as SQL identifiers.
The new keywords are:
- JOB (new)
- JOBS (new)
- STOP (new)
- DRAIN (new)
- SAVEPOINT (already reserved)
- SAVEPOINTS (already reserved)
Rejected Alternatives
Book Keep Query Status in SQL Gateway
An alternative approach to query monitoring is that the SQL client or gateway book keeps every query and is responsible for updating the query status through polling or callbacks. In that way, the query status is better maintained, and we wouldn’t lose track of the queries in cases that they’re cleaned up by the cluster or the cluster is unavailable.
However, there’re 2 major concerns:
- Table/SQL API should provide the same capabilities as its peer DataStream API, thus show queries statement implement should be aligned with flink list in CLI as well.
- Maintaining query status at the client/gateway side requires additional work but brings little extra user value, since the client/gateway doesn’t persist metadata at the moment.
Savepoint Syntax: SAVEPOINT / RELEASE SAVEPOINT
An alternative syntax of savepoints is like:
Code Block | ||
---|---|---|
| ||
SAVEPOINT '<job_id>'
RELEASE SAVEPOINT '<savepoint_path>' |
But there are mainly two concerns:
- Generally speaking, SAVEPOINT is more appropriate to be followed by a savepoint identifier instead of a job identifier.
- The statements are often used within database transaction blocks, so it would be kind of unnatural to be used alone.