Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Query predicates are normally constructed in a tree like structure so this will follow same pattern. The proposal is create a mapping utility from “Engine” query predicates to a HudiExpression. This way filtering logic is engine agnostic

For AND and OR operators we can translate to a tree node with left and right expressions. An example is shown below of what the structure would look

...

This way we can call evaluate on the root HudiExpression tree and it will determine whether the entire expression is satisfied for the file group.

...

In order for us to implement predicate push down in Hive we need to have access to the query predicate. Query predicate is not passed to Hive InputFormat by default. HiveStoragePredicateHandler interface needs to be implemented in order to provide query predicate to InputFormat and for this we need to create a custom HiveStorageHandler. Therefore we will be creating new storage handler HudiStorageHandler

...

We can use this information and the SearchArgument to generate our HudiExpression. Then in HoodieParquetInputFormat.listStatus() after fetching files from FileSystemView we for the remaining file groups we can apply data filter using HudieExpression using column metadata for the remaining file groups.

Spark


Presto

Rollout/Adoption Plan

...