Status
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently the Flink Kubernetes operator only supports running a Flink deployment in Kubernetes native mode and not standalone mode. One of the main concerns with running in kubernetes native mode is the fact that the Jobmanager needs to have access to the Kubernetes API which can be seen as a security concern in multi-tenant Flink + Kubernetes set-ups. Some scenarios may also wish to have a static allocation of taskmanagers to limit resource allocation for a single Flink cluster.
Supporting standalone mode in the operator also means the operator can support older Flink versions that don’t have Flink Kubernetes native features. Supporting more Flink versions increases the adoption of the operator as a way to manage Flink clusters and provides those users an easier path to upgrade their cluster.
Public Interfaces
The public interface is the FlinkDeployment custom resource descriptor (CRD), see below.
Proposed Changes
FlinkDeployment CRD
apiVersion: flink.apache.org/v1alpha1 kind: FlinkDeployment metadata: namespace: default name: basic-example spec: image: flink:1.14.3 flinkVersion: v1_14 flinkConfiguration: taskmanager.numberOfTaskSlots: "2" serviceAccount: flink jobManager: replicas: 1 resource: memory: "2048m" cpu: 1 taskManager: resource: replicas: 1 // (only needed for standalone clusters)* memory: "2048m" cpu: 1 mode: (native, standalone)
We propose adding a mode
to the spec
of the FlinkDeployment CRD to allow both standalone and native clusters to be deployed. This would allow 2 new types of Flink clusters to be created: standalone-application, standalone-session. This will default to native to maintain compatibility.
Also replicas
will be added to the taskManager
spec to specify the number of TaskManager pods to spin up, this will only be used for standalone clusters.
Standalone mode
All interactions with the Flink cluster is currently done via the FlinkService
which is integrated with the Kubernetes native nature of the cluster. This will be forked into a FlinkNativeService
and FlinkStandaloneService
to enable communication with both cluster types.
Version support
With standalone mode being supported the operator can also support deploying Flink clusters older than 1.14 (as far back as 1.2). Doing this we can increase the potential user-base of the operator and provide those users.
Supported Flink images are available on the docker repo from version 1.11 [2] so these can be supported by the connector in standalone mode. Previous Flink versions could also be used by the standalone mode, but not fully supported.
Flink Version | Native Support (no change) | Standalone Support (application) | Standalone Support (session) |
1.16 | ✅ | ✅ | ✅ |
1.15 | ✅ | ✅ | ✅ |
1.14 | ✅ | ✅ | ✅ |
1.13 | (✅) [3] | ✅ | ✅ |
1.12 | ? | ✅ | ✅ |
1.11 | ? | ✅ | ✅ |
1.10 | ? | ? | ? |
1.9 | ? | ? | ? |
1.8 | ? | ? | ? |
1.7 | ? | ? | ? |
1.6 | ? | ? | ? |
1.5 | ? | ? | ? |
1.4 | ? | ? | ? |
1.3 | ? | ? | ? |
1.2 | ? | ? | ? |
1.1 | ? | ? | ? |
1 | ? | ? | ? |
✅: Fully supported
?: Compatible but not supported
?: Not supported
Note: 1.13 support for native mode isn't implemented yet but should be possible [3]
ZooKeeper HA
a
Reactive Mode support
With standalone mode the door is open to support reactive mode for Flink cluster deployed by the operator. However as reactive mode is currently an MVP (minimum viable produce) feature [1] and would only be limited to the application mode this FLIP will not include support for this feature.
Compatibility, Deprecation, and Migration Plan
The CRD mode
will default to native
to maintain compatibility with the released 0.1.0 version.