You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Status

Current state: Under Discussion

Discussion threadhttps://mail-archives.apache.org/mod_mbox/flink-dev/201902.mbox/browser

JIRA Unable to render Jira issues macro, execution error.

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Flink has defined a few standard metrics for jobs, tasks and operators. It also supports custom metrics in various scenarios. However, so far there is no standard or conventional metric definition for the connectors. Each connector defines their own metrics at the moment. This complicates operation and monitoring. Admittedly, different connectors may have different metrics, but some commonly used metrics can probably be standardized. This FLIP proposes a set of standard connector metrics that each connector should emit if applicable. The metrics proposed in this FLIP will serve as a convention for the connector implementations.

Public Interfaces

We propose to introduce a set of conventional / standard metrics for the connectors.

Source Metrics

Name

Type

Unit

Description

numBytesIn

Counter

Bytes

The total number of input bytes since the source started

numBytesInPerSec

Meter

Bytes/Sec

The input bytes per second

numRecordsIn

Counter

Records

The total number of input records since the source started

numRecordsInPerSec

Meter

Records/Sec

The input records per second

numRecordsInErrorsCounterRecordsThe total number of record that failed to consume

recordSize

Histogram

Bytes

The size of a record

fetchLatency

Histogram

ms

The latency occurred before Flink fetched the record.

fetchLatency = FetchTime - EventTime

latency

Histogram

ms

The latency occurred before the record is emitted by the source connector.

latency = EmitTime - EventTime

idleTime

Gauge

ms

The time in milliseconds that the source has not processed any record.

idleTime = CurrentTime - LastRecordProcessTime

Sink Metrics

Name

Type

Unit

Description

numBytesOut

Counter

Bytes

The total number of output bytes since the source started

numBytesOutPerSec

Meter

Bytes/Sec

The output bytes per second

numRecordsOut

Counter

Records

The total number of output records since the source started

numRecordsOutPerSec

Meter

Records/Sec

The output records per second

numRecordsOutErrorsCounterRecordThe total number of records failed to send

recordSize

Histogram

Bytes

The size of a record

sendTime

Histogram

ms

The time it takes to send a record

Note:

  • A connector implementation does not have report all the following metrics. But the connectors that do report these metrics should conform to this convention.
  • The histogram metrics are usually very expensive, so it is strongly recommended that the connectors do not report them by default. But give the options to the users to enable them on demand.

Scope

The metric group for each source and sink would be the same as ordinary operator scope, i.e. default to <host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>

Additional connector specific metrics should also use the same scope.

Native Connector Metrics

If the connector has its original metrics, the original metric names should still be kept, even some of the original metrics are exposed with standard metric names.

Proposed Changes

  1. Add the proposed metrics to the existing connectors.
  2. Mark the old metrics as deprecated if necessary.
  3. Correct the scope and metric names of the connectors if needed.

Compatibility, Deprecation, and Migration Plan

This FLIP proposes adding new metrics to the connectors and mark the old duplicate metrics as deprecated if necessary.

After some releases, we would like to remove some of the old metrics that are duplicates of the new standard metrics. But there is no strict timeline for that.

Test Plan

Standard connector metric test suites will be created to ensure the connector names are implemented correctly.

Rejected Alternatives

None

  • No labels