Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

  1. The Union is followed by a select * and then a file sink.
  2. All parents of Union are file sinks.

Union may have more than 2 parents.

Let's say the output directory of the final file sink was dir_final. We will replace the output directories of subq1 and subq2 with dir_final/subquery_1 and dir_final/subquery_2, respectively. All other properties of the final file sink like gatherStats, etc. will also be copied. After this, we remove the union and everything below it.

The optimization is important for https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+OptimizationImage Removed , but should also be useful in other cases.