Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state"Under Discussion"

Discussion thread:

JIRA

Released: 

...

Motivation

This FLIP suggests aligning the memory model and configuration for Job Manager (JM) with the recently introduced memory model of Task Manager (TM) in FLIP-49.

The memory model of JM does need to be as extensive as the TM one. A lot of motivation points in FLIP-49 are not applicable here. Nonetheless, apart of aligning two memory models, there are couple of explicit issues with the current memory settings of JM:

  • `jobmanager.heap.size` option is deluding for the containerised environments (Yarn/Mesos/Kubernetes) because it does not represent the JM’s JVM heap size but the total process memory consumed by the Flink JVM process including container cut-off. It is used to set the JM container memory size requested in a containerised environment.
  • The purpose of the container cut-off can be also confusing on its own, the main use cases for it are:
    • Direct memory usage by Flink or user code dependencies (there are certain cases where user code is run during the job start up)
    • JVM Metaspace
    • Other JVM overhead
  • There is no way to reasonably limit direct memory allocation, so it is not controlled by JVM. Therefore it can be hard to debug direct memory leaks and container killing because of OOM.
  • Same for the JVM Metaspace size to expose possible class loading leaks.

...

Introduced configuration options

Memory component

options

Default value

Total Process Memory

jobmanager.memory.process.size

None

(“1472m” in default flink-conf.yaml)

Total Flink Memory

jobmanager.memory.flink.size

None

JVM Heap

jobmanager.memory.heap.size

None

Direct memory

jobmanager.memory.direct.size

“128m”

JVM Metaspace

jobmanager.memory.jvm-metaspace.size

“128m”

JVM Overhead

jobmanager.memory.jvm-overhead.min jobmanager.memory.jvm-overhead.max

jobmanager.memory.jvm-overhead.fraction

“192m”

“1g”

0.1

Implementation Steps

  1. Introduced new options
  2. Introduce data structures and utilities
    1. Data structure to store memory sizes of JM
    2. Utility for calculating memory sizes from configuration
  3. Extend the calculation utility and BashJavaUtils with generating JVM arguments to start JM process
  4. Call BashJavaUtils in the standalone startup scripts and use returned JVM arguments to start JM JVM process (ClusterEntryPoint) instead of current bash code
  5. Use new memory calculation utility to get the Total Process Memory size and the JVM arguments to start the JM container (ClusterEntryPoint) in the containerized environment

...