Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the initial implementation, combiner will be supported only when all projections are either expressions on the group column or expressions on algebraic UDFs. This is because column pruning does not currently discard unused columns within a grouped-bag, and in such cases there will not be data size reduction happening because of the use of in-map combiner.

Code Block

Map Plan
g: Local Rearrange[tuple]{bytearray}(false) - scope-73
|   |
|   Project[bytearray][0] - scope-74
|
|---f: New For Each(false,false)[bag] - scope-61
    |   |
    |   Project[bytearray][0] - scope-62
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - scope-63
    |   |
    |   |---Project[bag][1] - scope-64
    |       |
    |       |---Project[bag][1] - scope-65
    |
    |---Pre Combiner Local Rearrange[tuple]{Unknown} - scope-75
        |
        |---l: New For Each(false,false,false)[bag] - scope-47

Will change to -

Code Block

Map Plan
g: Local Rearrange[tuple]{bytearray}(false) - scope-73
|   |
|   Project[bytearray][0] - scope-74
|
|---f: HashAgg 
    |   |
    |   Project[bytearray][0] - scope-62
    |   |
    |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - scope-63
    |   |
    |   |---Project[bag][1] - scope-64
    |
    |---f: New For Each(false,false)[bag] - scope-61
        |   |
        |   Project[bytearray][0] - scope-62
        |   |
        |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - scope-63
        |   |
        |   |---Project[bag][1] - scope-64
        |       |
        |       |---Project[bag][1] - scope-65
        |
        |---Pre Combiner Local Rearrange[tuple]{Unknown} - scope-75
            |
            |---l: New For Each(false,false,false)[bag] - scope-47

The MR combiner will also be supported and by default in-map combiner will not be used. There will be a property that will need to be set to enable it. There will be another property that will control use of MR combiner along with in-map combiner. After sufficient testing is done, we can change the default execution mode and properties.

...