Status
Current state: Under Discussion
...
- The engine will automatically prune the partitions based on the filters and partition columns. Source don’t need do something.
- The table source need get all partition values.
- The problem is that every partition Pruning needs to get all partition values. When there are thousands of partitions, there will be a lot of pressure on catalog (for example, MySQL storage).
How to do partition pruning depends entirely on TableSource's own implementation:
- The table source can use catalog to do partition pruning. For example, hive table source can touch its catalog from creation of HiveTableFactory.
- Without catalog, the table source will list sub directories to do the filter by name.
How to do partition pruning depends on table:
...
void setStaticPartition(Map<String, String> partitions);
// get dynamic partition column names.
List<String> getDynamicPartitionFieldNames();
// If returns true, sink can trust all records will definitely be grouped by partition fields before consumed by the sink, sink can use “grouped multi-partition writer”. If returns false, there are no need to do partition grouping.
...