Status
Current state: "Under Discussion"
Discussion thread: -
...
Page properties | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
|
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
The MetricRegistry will contain a MetricQueryService, which acts like an unscheduled reporter.
The service is a separate actor that creates and returns a Key-Value representation of the entire metric space when queried.
The keys represent the name of the metric; formatted according to the following scope format strings:
metrics.scope.jm | 0:<user_scope>.<name> |
metrics.scope.tm | 1:<tm_id>:<user_scope>.<name> |
metrics.scope.jm.job | 2:<job_id>:<user_scope>.<name> |
metrics.scope.tm.job | 2:<job_id>:<user_scope>.<name> |
metrics.scope.tm.task | 3:<job_id>:<task_id>:<subtask_index>:<user_scope>.<name> |
metrics.scope.tm.operator | 4:<job_id>:<task_id>:<subtask_index>:<operator_name>:<user_scope>.<name> |
The initial number serves as a category for the WebInterface, and allows for faster handling as we don't have to parse the entire string before deciding what category it belongs to.
0 = JobManager
1 = TaskManager
2 = Job
3 = Task
4 = Operator
...
As there aren't any details as to how the separation will work, specifically whether a TaskManager -> WebInterface heartbeat will exist, i will assume that there is no message that we can piggyback on.
As such the WebInterface will regularly query The WebRuntimeMonitor will contain a MetricFetcher which queries the JobManager for all available TaskManagers, and then query each of them for a metric dump.
This will be done in a separate Thread inside the WebRuntimeMonitor, which also has the responsibility to merge the returned dumps.
Metrics are only fetched if they actually accessed via REST calls, with a minimum time period (10 seconds) between updates.
The fetched metrics are merged and The merged dump is kept in a central location inside the WebRuntimeMonitorMetricFetcher, available to different handlers.
...
MetricStore {
void addMetric(String name, Object value);
JobManager jobManager;
MetricStore
jobMan class JobManagerMetricStore
{
Map<String, Object> metrics;
}
} Map<String, TaskManager>TaskManagerMetricStore
> taskmanagers;
class TaskManagerMetricStore
{
Map<String, Object> metrics;
}
} Map<String, Job>JobMetricStore
> jobs;
class JobMetricStore
{
Map<String, Object> metrics;
Map<String, TaskMetricStore
Task>> tasks;
}
} class TaskMetricStore
{
Map<String, Object> metrics;
Map<String, Subtask>;
}
SubtaskMetricStore
>; } class SubtaskMetricStore
{
Map<String, Object> metrics;
}
}
...
Note that at any given time only one of these objects will exist.
...
Everything can be tested with unit tests.
Rejected Alternatives
-