Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties


Discussion thread

...

...

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-

...

19661

Release1.12


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Aligning the overview page of JobManager with the Flink's new memory model (FLIP-116). Additionally, we want to align the JobManager's details page containing memory-related information with the solution proposed in FLIP-102.

Proposed Changes

This includes making JobManager's memory-related metrics available in the UI. Additionally, the effective configuration parameters should be exposed similar to the TaskManager's overview (see FLIP-102: Add More Metrics to TaskManager).

JVM Metrics

These JVM metrics are exposed and can be used through the JobManager's metrics REST API.

Flink Compose0

user conf key1

configuration key2

metric max3

metric used3

Heap

Framework Heap

jobmanager.memory.heap.size

memoryConfiguration.frameworkHeap

JVMMetricUsed keyTotal key
Heap
Status.JVM.Memory.Heap
.
UsedMax
DirectStatus.JVM.Memory.
Heap.
DirectUsed

Off-Heap

Framework Off-Heapjobmanager.memory.off-heap.sizememoryConfiguration.frameworkOffHeap
Max
MappedStatus.JVM.Memory.
NonHeap.Max
MappedMemoryUsedTotalCapacity
NonHeapStatus.JVM.Memory.NonHeap
.Used

JVM MetaSpace

jobmanager.memory.jvm-metaspace.size

memoryConfiguration.jvmMetaspace
MemoryUsedTotalCapacity
Metaspace

Status.JVM.Memory.Metaspace

.Max

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19617

UsedMax

Memory Configuration

Flink's memory model (as described in org.apache.flink.runtime.jobmanager.JobManagerProcessSpec) can be mapped to the following Flink configuration parameters. There are a few that have a correlating Flink metric.

Flink Memory ModelFlink configuration1Effective Configuration REST API2Metric3Used keyTotal key
Heapjobmanager.memory.heap.sizejobmanager.memory.heap.sizeStatus.JVM.Memory.HeapUsedMax
Off-Heapjobmanager.memory.off-heap.sizejobmanager.memory.off-heap.size---
JVM Metaspace
.Used
jobmanager.memory.jvm-metaspace.sizejobmanager.memory.jvm-metaspace.size

Status.JVM.Memory.Metaspace 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19617

UsedMax
JVM Overheadjobmanager.memory.jvm-overhead.min
memoryConfiguration
jobmanager.
jvmOverhead
memory.jvm-overhead.min/ jobmanager.memory.jvm-overhead.max4---

Status.JVM.Memory.NonHeap.Max - Status.JVM.Memory.Metaspace.Max

Status.JVM.Memory.NonHeap.Used - Status.JVM.Memory.Metaspace.Used
jobmanager.memory.jvm-overhead.max

...

1 These are the configuration parameters used in the Flink configuration.
2 These are the Json paths to address the properties in the HTTP config parameters exposed through the cluster config REST API response. Additionally, memoryConfiguration.totalFlinkMemory  and totalProcessMemory are exposed through the REST API.. Their names matching the actual Flink config as the effective configuration is generated out of the passed configuration. 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19662

3 The metrics which are exposed through the metrics endpointthe JobManager's metrics REST API.
4 min and max are having the same value.

Frontend Design

Display metrics inside JVM, outside JVM and GC

Add new metrics page

Image AddedImage Removed

REST API proposal:

Metrics

The API provided now could only get the metric key without a value.

  • add JobManager metrics in MetricStore
  • show JobManager memory and Garbage collectors
  • after FLINK-9741  finished, show it according to flip-6
  • url: /jobmanager/metrics/info
  • response:

{

  "type" : "object",

  "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:cluster:JobManagerMetricsInfo",

  "properties" : {

    "heapUsed" : {

      "type" : "integer"

    },

    "heapCommitted" : {

      "type" : "integer"

    },

    "heapMax" : {

      "type" : "integer"

    },

    "nonHeapUsed" : {

      "type" : "integer"

    },

    "nonHeapCommitted" : {

      "type" : "integer"

    },

    "nonHeapMax" : {

      "type" : "integer"

    },

    "directCount" : {

      "type" : "integer"

    },

    "directUsed" : {

      "type" : "integer"

    },

    "directMax" : {

      "type" : "integer"

    },

    "mappedCount" : {

      "type" : "integer"

    },

    "mappedUsed" : {

      "type" : "integer"

    },

    "mappedMax" : {

      "type" : "integer"

    },

    "garbageCollectors" : {

      "type" : "array",

      "items" : {

        "type" : "object",

        "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:cluster:JobManagerMetricsInfo:GarbageCollectorInfo",

        "properties" : {

          "name" : {

            "type" : "string"

          },

          "count" : {

            "type" : "integer"

          },

          "time" : {

            "type" : "integer"

          }

        }

      }

    }

  }

 can be used to retrieve the metrics for the JobManager: http://localhost:8081/jobmanager/metrics?get=Status.JVM.Memory.Heap.Max,Status.JVM.Memory.Heap.Used,Status.JVM.Memory.NonHeap.Max,Status.JVM.Memory.NonHeap.Used,Status.JVM.Memory.Metaspace.Max,Status.JVM.Memory.Metaspace.Used

The Metaspace metrics need to be implemented. This is going to be handled by 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19617
.

Memory Configuration

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19662

We want to expose the effective configuration through a new REST endpoint. We have to consider that the memory configuration depends on the type of cluster (legacy standalone vs containerized memory configuration).

...

Test Plan

Covered by unit tests.