Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Motivation

Background

Flink restores opeartor operator state from snapshots based on matching the operatorIDs. Since Flink 1.2,   StreamGraphHasherV2 is used for operator ID generation when no user-set uid exists. The generated Operator ID is deterministic with respect to:

...

It is unclear at this point on why chained output nodes are involved in the algorithm, but the following history background might be related: prior to Flink 1.3, Flink runtime takes the snapshots by the operator ID of the first vertex in a chain, so it somewhat makes sense to include chained output nodes into the algorithm as chain-breaking/building is expected to break state-compatibility anyway.

...

Although Flink 1.3 brings us operator-level state recovery within a chain, the chaining behavior will still affect state compatibility, as the generation of the Operator ID is dependent on its chained output nodes. For example, a simple source->sink DAG with source and sink chained together is state incompatible with an otherwise identical DAG with source and sink unchained (either because the parallelisms of the two ops are changed to be unequal or chaining is disabled). This greatly limits the flexibility to perform chain-breaking/building for performance tuning, especially for SQL jobs where user-set UIDs are currently not supported.

Therefore, we propose to introduce StreamGraphHasherV3 that introducing StreamGraphHasherV3, which is agnostic of the chaining behavior of operators, so that users are free to tune the parallelism of individual operators without worrying about state incompatibility.

...