Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Page properties


Discussion threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)/thread/zt4ykyhv6cco83j9hjngn52b1oprj1tv
Vote threadhere (<- link to https://lists.apache.org/list.html?dev@flink.apache.org)/thread/w215fp5htsn3k7dyl28ydlzpg9s6q8lo
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-33697

JIRAhere (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

Release<Flink Version>


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Table of Contents

Motivation

FLIP-386 is building on top of FLIP-384. The intention here is to add a capability for state backends to attach custom attributes during recovery to recovery spans. For example RocksDBIncrementalRestoreOperation could report both remote download time and time to actually clip/ingest the RocksDB instances after rescaling.

Proposed Changes

Recovery metrics are first collected for each subtask and then sent back to the JobManager for the aggregation to a single number per Job (from all of the subtasks), as described in FLIP-384. From all of those aggregated metrics, a single recovery span trace is created on the JobManager. Thus this FLIP must define not only API to report a custom attribute, but also how to aggregate it later.Custom metrics will be aggregated the same way how the built-in metrics are aggregated in the FLIP-384 (using max and sum). 

Public Interfaces

It will be possible to set the custom attributes via the following interfaces:

Code Block
@Experimental
public interface CustomMetric {
    enum AggregationMethod {
        MIN,
        MAX,
        SUM,
        AVG
    }

    CustomMetric addAggregationMethod(AggregationMethod aggregationMethod);
}

@Experimental
public interface CustomInitializationMetrics {
    CustomMetricvoid addMetric(String name, long value);
    CustomMetricvoid addMetric(String name, double value);
}

// intendend usage:
customInitializationMetrics.addMetric("DownloadTime", value)
    .addAggregationMethod(AggregationMethod.MAX)
    .addAggregationMethod(AggregationMethod.SUM)

The remaining issue is to pass CustomInitializationMetrics instance to for example RocksDBIncrementalRestoreOperation . In this FLIP I’m proposing to do it via adding the following interfaces to the org.apache.flink.runtime.state.StateBackend interface:

...

With all of those changes state backends, like incremental RocksDB, will be able to measure custom metrics during the restore operation and report them to be included in recovery span.

Compatibility, Deprecation, and Migration Plan

This FLIP is proposing to clean up the StateBackend by deprecating all of the other previously used versions of the StateBackend#createKeyedStateBackend and StateBackend#createOperatorStateBackend methods. The newly proposed versions from this FLIP will be more future proof by making adding new parameters much easier while at the same time not braking compatibility for the users.

Test Plan

Change will be covered by automatic tests and tested manually inside the Confluent.

Rejected Alternatives

None