Most of the Pig 0.9 incompatibilities are in the area of syntactic and semantic cleanup. We expect these incompatibilities to have minimal impact on the users. This document contains the details of those changes.
Logical Plan
With Pig 0.9, we have completely transitioned to using new logical plan. As the result, it will not be possible to use old logical plan which has been a workaround for problems found in the new logical plan. This means that, starting with Pig 0.9 |pig.usenewlogicalplan| property will have no impact on the execution. You would still be able to disable individual optimization rules.
Parser Changes
Change |
Old Way |
New Way |
---|---|---|
Only single quates support for join modifier. Also for cogroup. |
"skewed" or 'skewed', "merge" or 'merge' |
only 'skewed', 'merge' |
org.apache.pig.impl.logicalLayer.parser.ParseException has been removed. This impacts a UDF that uses Utils.getSchemaFromString function. |
catch ParseException |
catch Exception |
Remove meaningless syntax |
B = (A); - legal |
B = (A); - illegal |
Remove meaningless support for as clause in filter |
C = ( filter B by $0 > 0 ) as (a:bytearray, b:long); - legal |
C = ( filter B by $0 > 0 ) as (a:bytearray, b:long); illegal |
Remove meaningless support for as clause in group |
D = group A by $0 as b:LONG; - legal |
D = group A by $0 as b:LONG; - illegal |
Deprecate PARALLEL on operators that do not start a reducer |
C = filter B by $0 > 0 PARALLEL 10; - legal |
C = filter B by $0 > 0 PARALLEL 10; - legal but will generate a warning; will be removed in the next release |
Streaming command options |
For each option, user can specify multiple times. |
User can specify at most once for each option. Violating this rule will result a validation error. |
Utils.getSchemaFromString |
Throw org.apache.pig.impl.logicalLayer.parser.ParseException |
Throw org.apache.pig.parser.ParserException |
negative numeric constants within parenthesis recognized as tuple constant |
(-1) treated as integer with -1 as value, but (1) was treated as tuple containing numeric value |
both (1) and (-1) treated as a tuple containing column with value -1 |
Semantic Changes
Change |
Old Way |
New Way |
JIRA |
---|---|---|---|
Flattening a bag with an unknown schema will produce a record with an unknown schema |
Schema will contain bytearray |
null schema |
|
Schema & Load related changes |
Pig could produce a gap between schema and data, which sometime will result runtime exceptions |
If load statement specify schema, Pig will truncate/padding null to make sure the loaded data has exactly the same number of fields specified in load statement. |
|
BinStorage do not cast bytes by default |
BinStorage will cast bytes but in a wrong way |
By default, casting bytes of BinStorage results an error. User needs to pass caster explicitly to BinStorage if they want to cast bytes |
|
When input relation's schema is present, the way matching UDF implementation class is found for UDF that take * as argument |
The * argument was not expanded, even though expanded arguments are passed at runtime. |
the expanded list of arguments is used for finding matching UDF class. SIZE(*) and COUNT(*) earlier did equivalent of SIZE($0) and COUNT($0), now SIZE(*) and COUNT(*) will fail at typechecking |
Interface Changes
Change |
Old Way |
New Way |
JIRA |
---|---|---|---|
LoadCaster |
|
Add "bytesToMap(byte[] b, ResourceFieldSchema fieldSchema)", mark "bytesToMap(byte[] b)" as deprecate |
Other changes
Change |
Impact |
JIRA |
---|---|---|
Combiner is used for query execution in more cases than before |
Many queries might run faster. But some queries might require morememory, specially ones where the algebraic function produces large bags. Distinct in foreach statement that gets input from a group-by statement is such an example. You can also reduce the memory footprint by disabling combiner in such cases (-Dpig.exec.nocombiner=true). |