Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: Under Discussion

...

  • The engine will automatically prune the partitions based on the filters and partition columns. Source don’t need do something.
  • The table source need get all partition values.
  • The problem is that every partition Pruning needs to get all partition values. When there are thousands of partitions, there will be a lot of pressure on catalog (for example, MySQL storage).

How to do partition pruning depends entirely on TableSource's own implementation:

  • The table source can use catalog to do partition pruning. For example, hive table source can touch its catalog from creation of HiveTableFactory.
  • Without catalog, the table source will list sub directories to do the filter by name.

How to do partition pruning depends on table: 

...

  void setStaticPartition(Map<String, String> partitions);

  // get dynamic partition column names.

  List<String> getDynamicPartitionFieldNames();

  // If returns true, sink can trust all records will definitely be grouped by partition fields before consumed by the sink, sink can use “grouped multi-partition writer”. If returns false, there are no need to do partition grouping.

...