Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If operator in ExpressionTree is defined as LEAF it indicates the leaf field will be populated with an index value. This index corresponds to the index in a list of PredicateLeafs it corresponds to a PredicateLeaf defined in the Search Argument. PredicateLeaf will contain information about the query predicate such as operator, column name, and literal which is being compared

        private final org.apache.hadoop.hive.ql.io.sarg.PredicateLeaf.Operator operator;
private final Type type;
private String columnName;
private final Object literal;
private final List<Object> literalList;

Hive supported operators in PredicateLeaf

  public static enum Operator {
EQUALS,
NULL_SAFE_EQUALS,
LESS_THAN,
LESS_THAN_EQUALS,
IN,
BETWEEN,
IS_NULL
}

Example query

select event_name from storage_handler_table where event_id = '1'

will produce following leaf:

Filter passed to Input Format: leaf-0 = (EQUALS event_id 1), expr = leaf-0


We can use this information and the SearchArgument to generate our HudiExpression. Then in HoodieParquetInputFormat.listStatus() after fetching files from FileSystemView we can apply data filter using column metadata for the remaining file groups.

...