Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties


Discussion

...

thread

FLIP-75 discussion about the initial design

FLIP-102 discussion after splitting up FLIP-75 into sub-flips

...

...

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14431

...

Release1.12


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

1 These are the configuration parameters used in the Flink configuration.
2 These are the Json paths to address the properties in the HTTP REST API response. Additionally, memoryConfiguration.totalFlinkMemory  and totalProcessMemory are exposed through the REST API.
3 The metrics which are exposed through the TaskManager's metrics REST API.

Frontend Design

...

Redesign the task manager metric page, this would allow users to more clearly understand the relationship between these metrics.

The previous metrics are moved into 'Advanced' since maybe some users still need them.

Image Added

Detail view

Image RemovedImage Added

REST API Design

...

Memory Configuration

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14435

The TaskManager's memory configuration will be exposed through {{

...

/taskmanagers/:taskmanagerid}}. A proposed REST respond is shown in the code block below:

Code Block
languagejs
titleJSON Schema of response
collapsetrue
{
  "type" : "object",
  "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:taskmanager:TaskManagerDetailsInfo",
  "properties" : {
    "id" : {
      "type" : "any"
    },
    "path" : {
      "type" : "string"
    },
    "dataPort" : {
      "type" : "integer"
    },
    "timeSinceLastHeartbeat" : {
      "type" : "integer"
    },
    "slotsNumber" : {
      "type" : "integer"
    },
    "freeSlots" : {
      "type" : "integer"
    },
    "hardware" : {
      "type" : "object",
      "id" : "urn:jsonschema:org:apache:flink:runtime:instance:HardwareDescription",
      "properties" : {
        "cpuCores" : {
          "type" : "integer"
        },
        "physicalMemory" : {
          "type" : "integer"
        },
        "freeMemory" : {
          "type" : "integer"
        },
        "managedMemory" : {
          "type" : "integer"
        }
      }
    },
    "memoryConfiguration" : {
      "type" : "object",
      "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:taskmanager:TaskExecutorMemoryConfiguration",
      "properties" : {
        "frameworkHeap" : {
          "type" : "long"
        },
        "frameworkOffHeap" : {
          "type" : "long"
        },
        "taskHeap" : {
          "type" : "long"
        },
        "taskOffHeap" : {
          "type" : "long"
        },
        "networkMemory" : {
          "type" : "long"
        },
        "managedMemory" : {
          "type" : "long"
        },
        "jvmMetaspace" : {
          "type" : "long"
        },
        "jvmOverhead" : {
          "type" : "long"
        },
        "totalFlinkMemory" : {
          "type" : "long"
        }
        "totalProcessMemory" : {
          "type" : "long"
        }
      }
    },
    "metrics" : {
      "type" : "object",
      "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:taskmanager:TaskManagerMetricsInfo",
      "properties" : {
        "heapUsed" : {
          "type" : "integer"
        },
        "heapCommitted" : {
          "type" : "integer"
        },
        "heapMax" : {
          "type" : "integer"
        },
        "metaspaceUsed" : {
          "type" : "integer"
        },
        "metaspaceCommitted" : {
          "type" : "integer"
        },
        "metaspaceMax" : {
          "type" : "integer"
        },      
        "nonHeapUsed" : {
          "type" : "integer"
        },
        "nonHeapCommitted" : {
          "type" : "integer"
        },
        "nonHeapMax" : {
          "type" : "integer"
        },
        "directCount" : {
          "type" : "integer"
        },
        "directUsed" : {
          "type" : "integer"
        },
        "directMax" : {
          "type" : "integer"
        },
        "mappedCount" : {
          "type" : "integer"
        },
        "mappedUsed" : {
          "type" : "integer"
        },
        "mappedMax" : {
          "type" : "integer"
        },
        "memorySegmentsAvailable" : {
          "type" : "integer"
        },
        "memorySegmentsTotal" : {
          "type" : "integer"
        },
        "managedMemoryUsed" : {
          "type" : "long"
        },
        "managedMemoryTotal" : {
          "type" : "long"
        },
        "networkMemoryUsed" : {
          "type" : "long"
        },
        "networkMemoryTotal" : {
          "type" : "long"
        },
        "garbageCollectors" : {
          "type" : "array",
          "items" : {
            "type" : "object",
            "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:taskmanager:TaskManagerMetricsInfo:GarbageCollectorInfo",
            "properties" : {
              "name" : {
                "type" : "string"
              },
              "count" : {
                "type" : "integer"
              },
              "time" : {
                "type" : "integer"
              }
            }
          }
        }
      }
    }
  }
}

Metrics exposure

The newly introduced metrics can be accessed through the metrics REST endpoint.

Implementation Proposal

Step 1: Expose effective configuration parameters of TaskExecutorn

...

Step 5: Update TaskManager's details page 

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-1532819764

The web UI has to be updated as proposed above.

...

  • Create a separate independent endpoint for the effective memory configuration.
  • Deprecate the metrics sub-record returned by /taskmanagers/:taskmanagerid . The metrics endpoint can be used instead. This would simplify the TaskManagerDetailsHandler .

Test Plan

Existing tests are updated to verify feature.