Status

Current state: Under Discussion

...

The engine will automatically prune the partitions based on the filters and partition columns. Source don’t need do something.
The table source need get all partition values.
The problem is that every partition Pruning needs to get all partition values. When there are thousands of partitions, there will be a lot of pressure on catalog (for example, MySQL storage).

How to do partition pruning depends entirely on TableSource's own implementation:

The table source can use catalog to do partition pruning. For example, hive table source can touch its catalog from creation of HiveTableFactory.
Without catalog, the table source will list sub directories to do the filter by name.

How to do partition pruning depends on table:

...

void setStaticPartition(Map<String, String> partitions);

// get dynamic partition column names.

List<String> getDynamicPartitionFieldNames();

// If returns true, sink can trust all records will definitely be grouped by partition fields before consumed by the sink, sink can use “grouped multi-partition writer”. If returns false, there are no need to do partition grouping.

...

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Status

Page tree

Page History

Versions Compared

Old Version 5

New Version 6

Key

Status