Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We are proposing to provide an Flink k8s operator implementation as part of Flink that is maintained by the community and closely integrated with the Flink ecosystem. This implementation will benefit from the extensive experience of Flink community members with large scale mission critical Flink deployments and learnings from existing operator implementations. As part of Flink, the operator will have a better chance to follow the development of Flink core, influence changes to Flink core and benefit from the established collaboration processes of the project.

Initial Feature Set

For the initial version of the operator we aim to target core aspects of job lifecycle management.

  • CRD to express Flink application (for details see CRD section below)
    • External jar artifact fetcher support (s3, https etc.) via init container
    • creates an empty session cluster, no application/job management
    • the session cluster can be used to control jobs externally (like submission via REST API)
    • Supports all Flink configuration properties
    • Docker image
    • Upgrade policy (savepoint, stateless)
    • Restore policy (savepoint, latest externalized checkpoint, stateless)
    • jobmanager and taskmanager pod template (unrestricted k8s pod configuration)
    • Support explicit session cluster (no job management) and application mode
  • Create & deploy new Flink application
    • Empty state
    • From savepoint
  • Upgrade Flink application with or w/o savepoint on any CR change, including:
    • Flink configuration change
    • Job jar change
    • Docker image change
  • Pause/Resume Flink application
    • the job will not continue its data processing
    • the job will not be deleted from the cluster
    • the job will release its resources back to the cluster (can be used by other jobs)
    • Stops job with savepoint, tracks savepoint/last checkpoint in CR status for resume.
  • Delete Flink application
  • Integrate with Flink Kubernetes HA module [4]
    • When selected, operator can obtain latest checkpoint from config map and does not depend on a potentially unavailable Flink job REST API
    • This should the default, but not a hard dependency
  • Support Flink UI ingress
  • CI/CD with operator Docker image artifact, publish image in dockerhub

Compatibility, Deprecation, and Migration Plan

...