Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently Map-side join utilizes a hashmap and a join is performed when the incoming key matches a key in the hash map. To support range join this will abstracted into a pluggable interface. The plugin can decide how two keys are joined. The equality join interface will continue to utilize a hashmap while range join can use a data structure such as an interval tree. Other such optimizations can be made. For example the not equals join condition <> can use a view on top of a map.

Reduce-side

Wiki Markup
Reduce-side joins will be implemented via 1-Bucket-Theta as described in \[3\]. This requires the cardinality of the two relations and therefore to perform a reduce-side theta join statistics must be turned on. Initially if the required statistics do not exist an exception will be thrown indicating the problem. After the initial implementation we can use a method to estimate the cardinality.

...