Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Unify where the pendingRecords metric is reported (enumerator vs reader).

    For unbounded splits like Kafka/Pulsar partitions, it would be difficult to calculate these outside of the readers--thus requiring additional RPCs to communicate this to the enumerator if we wanted to unify. We would also need to introduce configuration to emit these events periodically, complicating the design.

    For bounded splits like File/Iceberg splits, the individual readers have no context about what remains.

    Thus, I propose to report this metric from enumerator and reader in the way that is easiest for the specific connector and handle both metricGroups in the autoscaling implementation.

  2. Tracking assigned splits in the enumerator
    We need to track this in the SourceReaders since splits can be "completed" and this is only tracked by readers. There are some poll based sources that only take 1 split and poll for another when completed, but we cannot make that assumption in general (i.e. request split when a split is completed). So, this needs to be tracked in the reader.