Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Sort sources by node ID, which keeps deterministic traversal order of the sources across job submissions
  2. Traverse the stream graph in the breadth-first manner, for each of the visited nodenodes:
    1. If the hashes of all upstream nodes have been generated
      1. Generate a node local hash using the murmur3_128 hash function: input = traverse order ID of the node, for each of its chaining output nodes, put the same traverse order ID again into the input (see HashingExplained · google/guava Wiki (github.com) for details on the Hashing algorithm API)
      2. For each of the upstream nodes, compute hash = hash * 37 XOR hash[upstream node] to incorporate upstream topology info into the node hash
    2. Else push back to the queue for visiting at a later time

Proposed changes

From the schetch sketch of the current algortihm algorithm above, we can see that the dependency on chaining output nodes only resides in the generation of the node local hash values (in bolded letters), and it suffices to just removes remove the relevant codes (L227-235 ofStreamGraphHasherV2) without changing anything else. The generated hash value for each node is still deterministic as we do not change the traverse order of the graph, and the generated hash value is also still statistically unqiue unique as we still provide different input values to the hash function. 

...

The change will break cross-version state compatibility, the . The migration plan is as follows:

...