...
- Custom Resource to express Flink application in Kubernetes native way (for details see CR example section below)
- External jar artifact fetcher support (s3, https etc.) via init container
- similar to https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
- Access to native Flink properties and native Kubernetes pod settings
- Minimal shorthand (proxy) settings that operator translates to underlying native settings (memory, cpu)
- shorthand settings override underlying settings
- Supports all Flink configuration properties
- Docker image
- Upgrade policy (savepoint, stateless)Restore policy (savepoint, latest externalized checkpoint-state, stateless)
- Pod template for jobmanager and taskmanager
- full control over k8s pod template (no mapping/whitelisting)
- layering/merging of pod templates (operator itself could also apply cluster wide defaults)
- similar to https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
- External jar artifact fetcher support (s3, https etc.) via init container
- Support explicit session cluster (no job management) and application mode
- the session cluster can be used to control jobs externally (like submission via REST API)
- Create & deploy new Flink application
- Empty state
- From savepoint
- Upgrade Flink application with or w/o savepoint on any CR change, including:
- Flink configuration change
- Job jar change
- Docker image change
- Pause/Resume Flink application
- the job will not continue its data processing
- the job will not be deleted from the cluster
- the job will release its resources back to the cluster (can be used by other jobs)
- Stops job with savepoint, tracks savepoint/last checkpoint in CR status for resume.
- Delete Flink application
- Integrate with Flink Kubernetes HA module [4]
- When selected, operator can obtain latest checkpoint from config map and does not depend on a potentially unavailable Flink job REST API
- This should the default, but not a hard dependency
- Support Flink UI ingress
- CI/CD with operator Docker image artifact, publish image to dockerhub
- Error handling
- Retry based on exception classifiers
- Propagation of job submission errors through k8s event and/or status
...
In the long run it might make sense to support both deployment modes in the operator, however initially we should focus the development effort on a single approach. Maybe start with support for [2] since we could reuse the code in a Java based implementation.
CR Example
The
kind: FlinkDeployment |
...