Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: Under Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently the Flink Kubernetes operator only supports running a Flink deployment in Kubernetes native mode and not standalone mode. One of the main concerns with running in kubernetes native mode is the fact that the Jobmanager needs to have access to the Kubernetes API which can be seen as a security concern in multi-tenant Flink + Kubernetes set-ups. Some scenarios may also wish to have a static allocation of taskmanagers to limit resource allocation for a single Flink cluster.

Supporting standalone mode in the operator also means the operator can support older Flink versions that don’t have Flink Kubernetes native features. Supporting more Flink versions increases the adoption of the operator as a way to manage Flink clusters and provides those users an easier path to upgrade their cluster. 

Public Interfaces

The public interface is the FlinkDeployment custom resource descriptor (CRD), see below.

Proposed Changes

FlinkDeployment CRD

Code Block
languageyml
titleCR Example
apiVersion: flink.apache.org/v1alpha1
kind: FlinkDeployment
metadata:
  namespace: default
  name: basic-example
spec:
  image: flink:1.14.3
  flinkVersion: v1_14
  flinkConfiguration:
    taskmanager.numberOfTaskSlots: "2"
  serviceAccount: flink
  jobManager:
    replicas: 1
    resource:
      memory: "2048m"
      cpu: 1
  taskManager:
    resource:
      replicas: 1 // (only needed for standalone clusters)* 
      memory: "2048m"
      cpu: 1
  mode: (native, standalone)

...

We propose adding a mode to the specof the FlinkDeployment CRD to allow both standalone and native clusters to be deployed. This would allow 2 new types of Flink clusters to be created: standalone-application, standalone-session. This will default to native to maintain compatibility.

Also replicas will be added to the taskManager spec to specify the number of TaskManager pods to spin up, this will only be used for standalone session clusters. For application clusters the number of replicas to spin up can be calculated using the job parallelism and the taskmanager.numberOfTaskSlots configuration

Standalone mode

All interactions with the Flink cluster is currently done via the FlinkService which is integrated with the Kubernetes native nature of the cluster. This will be forked into a FlinkNativeService and FlinkStandaloneService to enable communication with both cluster types.

Version support

With standalone mode being supported the operator can also support deploying Flink clusters older than 1.14 (as far back as 1.2). Doing this we can increase the potential user-base of the operator and provide those users.

Supported Flink images are available on the docker repo from version 1.11 [2] so these can be supported by the connector in standalone mode. Previous Flink versions could also be used by the standalone mode, but not fully supported.

...

✅: Fully supported
?: Compatible but not supported
?: Not supported


Reactive Mode support

With standalone mode the door is open to support reactive mode for Flink cluster deployed by the operator. However as reactive mode is currently an MVP (minimum viable produce) feature [1] and would only be limited to the application mode this FLIP will not include support for this feature.

Compatibility, Deprecation, and Migration Plan

The CRD mode will default to native to maintain compatibility with the released 0.1.0 version.

References