Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Table of Contents

Table of Contents

Summary

The Pig/Tez integration will be based on Achal Soni’s work. But one difference will be translating physical plan directly to Tez plan rather than translating MR plan to Tez plan. Complete decoupling between Tez and MR plans will provide not only cleaner implementation but also more flexibility for future improvements.

In addition to the front-end changes, Tez processors (PigProcessor) will be implemented in the back-end. This allows us to translate Pig queries to more optimal Tez DAGs.

Design

Frontend

Backend

Code Block
titlePigProcessor.java
public class PigProcessor implements org.apache.tez.runtime.api.LogicalIOProcessor { ... }

Scope of phrase 1

Pig will take the same approach as what Hive is taking in the first phrase. The specific goals include:

  • Make core Pig operators (including join, group-by, etc) work.
  • Implement MRR optimization (Multiple reduce-stage jobs).
  • Implement MPJ optimization (Multi-parent shuffle joins).

Functional requirements of phase 1

Functional requirements are almost identical to those of Hive on Tez, which can be viewed here.