Status

Discussion threadhttps://lists.apache.org/thread/rv964g6rq5bkc8kwx36y80nwfqcgn2s4
Vote threadhttps://lists.apache.org/thread/pfwj04zr0ncljnt3z91kjqopdfk0q0w5
JIRA

Unable to render Jira issues macro, execution error.

Releasekubernetes-operator-1.2.0

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently the Flink Kubernetes operator only supports running a Flink deployment in Kubernetes native mode and not standalone mode. One of the main concerns with running in kubernetes native mode is the fact that the Jobmanager needs to have access to the Kubernetes API which can be seen as a security concern in multi-tenant Flink + Kubernetes set-ups. Some scenarios may also wish to have a static allocation of taskmanagers to limit resource allocation for a single Flink cluster.

Supporting standalone mode in the operator also means the operator can support older Flink versions that don’t have Flink Kubernetes native features. Supporting more Flink versions increases the adoption of the operator as a way to manage Flink clusters and provides those users an easier path to upgrade their cluster. 

Public Interfaces

The public interface is the FlinkDeployment custom resource descriptor (CRD), see below.

Proposed Changes

FlinkDeployment CRD

CR Example
apiVersion: flink.apache.org/v1alpha1
kind: FlinkDeployment
metadata:
  namespace: default
  name: basic-example
spec:
  image: flink:1.14.3
  flinkVersion: v1_14
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      replicas: 1 // (only needed for standalone clusters)* 
      memory: "2048m"
      cpu: 1
  mode: (native, standalone)

We propose adding a mode to the spec of the FlinkDeployment CRD to allow both standalone and native clusters to be deployed. This would allow 2 new types of Flink clusters to be created: standalone-application, standalone-session. This will default to native to maintain compatibility.

Also replicas will be added to the taskManager spec to specify the number of TaskManager pods to spin up, this will only be used for standalone clusters.

Standalone mode

All interactions with the Flink cluster is currently done via the FlinkService which is integrated with the Kubernetes native nature of the cluster. This will be forked into a FlinkNativeService and FlinkStandaloneService to enable communication with both cluster types.

Version support

With standalone mode being supported the operator can also support deploying Flink clusters older than 1.14 (as far back as 1.2). Doing this we can increase the potential user-base of the operator and provide those users.

Supported Flink images are available on the docker repo from version 1.11 [2] so these can be supported by the connector in standalone mode. Previous Flink versions could also be used by the standalone mode, but not fully supported.

Flink VersionNative Support (no change)Standalone Support (application)Standalone Support (session)
1.16
1.15
1.14
1.13(✅) [3]
1.12?
1.11?
1.10???
1.9???
1.8???
1.7???
1.6???
1.5???
1.4???
1.3???
1.2???
1.1???
1???

✅: Fully supported
?: Compatible but not supported
?: Not supported

Note: 1.13 support for native mode isn't implemented yet but should be possible [3]

ZooKeeper HA

a

Reactive Mode support

With standalone mode the door is open to support reactive mode for Flink cluster deployed by the operator. However as reactive mode is currently an MVP (minimum viable produce) feature [1] and would only be limited to the application mode this FLIP will not include support for this feature.

Compatibility, Deprecation, and Migration Plan

The CRD mode will default to native to maintain compatibility with the released 0.1.0 version.

References