Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: "Under Discussion"

Discussion thread: -

...

Page properties


Discussion thread
Vote thread
JIRA

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-

...

4389

Release1.2

...



Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

The MetricRegistry will contain a MetricQueryService, which acts like an unscheduled reporter.
The service is a separate actor that creates and returns a Key-Value representation of the entire metric space when queried.

The keys represent the name of the metric; formatted according to the following scope format strings:

metrics.scope.jm0:<user_scope>.<name>
metrics.scope.tm1:<tm_id>:<user_scope>.<name>
metrics.scope.jm.job2:<job_id>:<user_scope>.<name>
metrics.scope.tm.job2:<job_id>:<user_scope>.<name>
metrics.scope.tm.task3:<job_id>:<task_id>:<subtask_index>:<user_scope>.<name>
metrics.scope.tm.operator4:<job_id>:<task_id>:<subtask_index>:<operator_name>:<user_scope>.<name>

The initial number serves as a category for the WebInterface, and allows for faster handling as we don't have to parse the entire string before deciding what category it belongs to.
  0 = JobManager
  1 = TaskManager
  2 = Job
  3 = Task
  4 = Operator

...

As there aren't any details as to how the separation will work, specifically whether a TaskManager -> WebInterface heartbeat will exist, i will assume that there is no message that we can piggyback on.

As such the WebInterface will regularly query The WebRuntimeMonitor will contain a MetricFetcher which queries the JobManager for all available TaskManagers, and then query each of them for a metric dump.

This will be done in a separate Thread inside the WebRuntimeMonitor, which also has the responsibility to merge the returned dumps.

Metrics are only fetched if they actually accessed via REST calls, with a minimum time period (10 seconds) between updates.

The fetched metrics are merged and The merged dump is kept in a central location inside the WebRuntimeMonitorMetricFetcher, available to different handlers.

...

MetricStore {
	void addMetric(String name, Object value);

	JobManagerMetricStore  jobMan jobManager;

	class JobManagerMetricStore {
		 Map<String, Object> metrics;
	}

	 } Map<String, TaskManagerMetricStore> taskmanagers;

	 class TaskManagerMetricStore {
		 Map<String, Object> metrics;
	}

	 } Map<String, JobMetricStore> jobs;

	 class JobMetricStore {
		 Map<String, Object> metrics;
		 Map<String, TaskMetricStore > tasks;
	}

	 } class TaskMetricStore {
		 Map<String, Object> metrics;
		 Map<String, SubtaskMetricStore>;
	}

	 } class SubtaskMetricStore  {
		 Map<String, Object> metrics;
	 }
 }

...


Note that at any given time only one of these objects will exist.

...

Everything can be tested with unit tests.

Rejected Alternatives

-