Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, the statistical dimensions used by the optimizer include row count, ndv(number fo distinct value), null count, max length, min length, max value and min value.[4] The file count, file size (which can be easily get from file system) is not used in the planner now, we can improve this later.

/**
 * Extension of {@linkinput DecodingFormat}format which is able to report estimated statistics for FileSystem
 * connector.
 */
@PublicEvolving
public interface FileBasedStatisticsReportableDecodingFormat<I> extendsFileBasedStatisticsReportableInputFormat DecodingFormat<I> {

    /**
     * Returns the estimated statistics of this {@linkinput DecodingFormat}format.
     *
     * @param files The files to be estimated.
     * @param producedDataType the final output type of the format.
     */
    TableStats reportStatistics(List<Path> files, DataType producedDataType);
}

...