Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state"Under DiscussionAccepted"

Discussion thread: here

Voting thread: here

JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-8528

...

Currently, a user would need to query Trogdor’s REST API in order to get any sort of information about a Trogdor cluster. This presents a significant burden on the user and limits the amount of information readily and easily available in terms of the health of a Trogdor cluster. Thus, adding metrics would allow for significant ease in monitoring agents and tasks in Trogdor clusters.

Public Interfaces

Define We define a new trogdor-metrics group that captures the metrics as defined below.

Metric/Attribute Name

Description

active-agents-count

The total number of active agents in the Trogdor cluster

created-task-count

The total number of created tasks in the Trogdor cluster

running-task-count

The total number of running tasks in the Trogdor cluster

done-task-count

The total number of done tasks in the Trogdor cluster

All metrics listed above are simply cumulative sums of the number of tasks/agents in each respective state. Thus, as these are cumulative counters, we expect that when a Trogdor cluster has finished all tasks, we'll have created-task-count = running-task-count = done-task-count.

Proposed Changes

We propose adding a TrogdorMetrics class to Trogdor that exposes the aforementioned metrics.   Since Trogdor agents and tasks share a common Platform class, a TrogdorContainer class will be created inside the Platform class to allow for the creation of a shared TrogdorMetrics instance between the Agent and Coordinator classes.

...

However, by way of simple mathmathematics, we are able to deduce the number of pending tasks by simply subtracting the number of pending tasks from those that are running and done. Similarly, we are able to deduce the number of running tasks from those that are pending and done. The number of done tasks will be the true number of done tasks, with no mathematics necessary. This allows for the tracking of fewer metrics. The STOPPING state is more of a transient state and thus doesn’t add too much significance to metrics, so it was deemed useful to only have metrics tracking PENDING, RUNNING, and DONE tasks.