...

The algorithm to identify whether two streams (or tables) represent the same entity will first check if the StreamSourceNode nodes that are the join arguments are reading from the same topic(s) and then, will do a traversal of the graph and will check if the path from the StreamSourceNode node to the root does not have nodes that can change the semantics of the stream. Initially, most nodes will be disallowed to be on the path, only peek and print should be allowed. Later, we can look into adding some smarts into the algorithm to identify whether a mapper actually changes the data in the stream or not. If both conditions hold, then the join can be implemented as a self join. This algorithm is future-proof to support the addition of new DSL operators without affecting the behavior of the self-join.

-------------- In progress ------------

We will add the flag boolean isSelfJoin to StreamStreamJoinNode that will be used when building the topology. If ir is true, then instead of adding the 5 processors that are currently added for an inner join, we will add 2 processors TODO

Implement the self-join operator (add record to state store, then do lookup)operator

The self-join operator will maintain one state store. For every new record, the record will first get added to the state store and then the operator will perform a lookup into the state store to do the actual join.

Add a rule to the optimizer that will rewrite an inner join to a self join TODO-join. The graph (logical plan) created from the DSL excerpts above are the same (after the optimization mergeDuplicateSourceNodes) and look as follows:

Image Added

The graph gets translated into the following topology (physical plan)

Image Added

The self-join rewriting will take the above graph, and will translate it into the following topology instead:

Image Added

Compatibility, Deprecation, and Migration Plan

...

Space shortcuts

Child pages

Versions Compared

Old Version 5

New Version 6

Key

Compatibility, Deprecation, and Migration Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 5

New Version 6

Key

Compatibility, Deprecation, and Migration Plan