Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The algorithm to identify whether two streams (or tables) represent the same entity will first check if the StreamSourceNode nodes that are the join arguments are reading from the same topic(s) and then, will do a traversal of the graph and will check if the path from the StreamSourceNode node to the root does not have nodes that can change the semantics of the stream. Initially, most nodes will be disallowed to be on the path, only peek and print should be allowed. Later, we can look into adding some smarts into the algorithm to identify whether a mapper actually changes the data in the stream or not.  If both conditions hold, then the join can be implemented as a self join. This algorithm is future-proof to support the addition of new DSL operators without affecting the behavior of the self-join.

-------------- In progress ------------

We will add the flag boolean isSelfJoin to StreamStreamJoinNode that will be used when building the topology. If ir is true, then instead of adding the 5 processors that are currently added for an inner join, we will add 2 processors TODO

...

  • Add a rule to the optimizer that will rewrite an inner join to a self join TODO

...




Compatibility, Deprecation, and Migration Plan

...