Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Metric tags are:

  • type=stream-state-updater-metrics
  • client-id=[clientId]
  • thread-id=[threadId]

Recording level is: INFO

...

The POC implementation of the proposed metrics can be found here: https://github.com/apache/kafka/pull/12391


Metric Name

Type

DescriptionNotes
active-restoring
-active
-tasks
countThe number of active tasks currently undergoing restoration
restoring

standby-
standby
updating-tasks
countThe number of active tasks currently undergoing
restoration
updating
active-paused-
active-
tasks
countThe number of active tasks paused restoring
standby-paused
-standby
-tasks
countThe number of standby tasks paused
restoring
updating
idle-ratio
gauge (percentage)The fraction of time the thread spent on being idleidle-ratio + restore-ratio + checkpoint-ratio should be 1
restore-ratio
gauge (percentage)The fraction of time the thread spent on restoring active or standby tasksidle-ratio + restore-ratio + checkpoint-ratio should be 1
checkpoint-ratio
gauge (percentage)The fraction of time the thread spent on checkpointing restored progressidle-ratio + restore-ratio + checkpoint-ratio should be 1
restore
active-records-restored-total
countThe total number of records restored
restore-records-
for active tasksit is for the lifetime of the streams app, hence ever going 
standby-records-updated-total
countThe total number of records updated for active tasksit is for the lifetime of the streams app, hence ever going 
active-records-remaining
countThe number of records remained to be restoredit should be usually declining, and during rebalance it may be jumping up or down
standby-records-remaining
countThe number of records remained to be updatedit could be usually increasing or declining, and during rebalance it may be jumping up or down
records-restored-rate
rateThe average per-second number of records restored for active or updated for standbyit counts for both active and standby tasks
restore-call-rate
rateThe average per-second number of restore calls triggered


Along with these new metrics, we would also deprecate the metrics below:

...

Code Block
languagejava
public interface StateRestoreListener {

    void onRestoreStart(final TopicPartition topicPartition,
                        final String storeName,
                        final long startingOffset,
                        final long endingOffset);

    void onRestoreEnd(final TopicPartition topicPartition,
                      final String storeName,
                      final long totalRestored);

    ...

    /**
     * NEW FUNC. Method called when restoring the {@link StateStore} is pausedsuspended due to the task being suspended from the host.
     *           If the task was resumed after suspension and restoration continues, another {@link onRestoreStart} would be called. 
     */
    default void onRestorePausedonRestoreSuspended(final TopicPartition topicPartition,
                                    final String storeName,
                                    final long totalRestored) {
        // do nothing
    } 
}

...