Status
...
Page properties | |
---|---|
|
...
...
|
...
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
Currently, the statistical dimensions used by the optimizer include row count, ndv(number fo distinct value), null count, max length, min length, max value and min value.[4] The file count, file size (which can be easily get from file system) is not used in the planner now, we can improve this later.
/** * Extension of {@linkinput DecodingFormat}format which is able to report estimated statistics for FileSystem * connector. */ @PublicEvolving public interface FileBasedStatisticsReportableDecodingFormat<I>FileBasedStatisticsReportableInputFormat extends DecodingFormat<I> { /** * Returns the estimated statistics of this {@linkinput DecodingFormat}format. * * @param files The files to be estimated. * @param producedDataType the final output type of the format. */ TableStats reportStatistics(List<Path> files, DataType producedDataType); }
...
private LogicalTableScan recomputeStatistics(LogicalTableScan scan) {
final RelOptTable scanTable = scan.getTable();
if (!(scanTable instanceof TableSourceTable)) {
return scan;
}
boolean reportStatEnabled =
ShortcutUtils.unwrapContext(scan)
.getTableConfig()
.get(TABLE_OPTIMIZER_SOURCE_COLLECTREPORT_STATISTICS_ENABLED)
&& table.tableSource() instanceof SupportsStatisticReport;
...