THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Currently, the statistical dimensions used by the optimizer include row count, ndv(number fo distinct value), null count, max length, min length, max value and min value.[4] The file count, file size (which can be easily get from file system) is not used in the planner now, we can improve this later.
/** * Extension of {@linkinput DecodingFormat}format which is able to report estimated statistics for FileSystem * connector. */ @PublicEvolving public interface FileBasedStatisticsReportableDecodingFormat<I> extendsFileBasedStatisticsReportableInputFormat DecodingFormat<I> { /** * Returns the estimated statistics of this {@linkinput DecodingFormat}format. * * @param files The files to be estimated. * @param producedDataType the final output type of the format. */ TableStats reportStatistics(List<Path> files, DataType producedDataType); }
...