Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties


Discussion thread

...

...

Vote threadhttps://lists.apache.org/list.html?dev@flink.apache.org
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-27982

...

Release1.16


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Currently, the statistical dimensions used by the optimizer include row count, ndv(number fo distinct value), null count, max length, min length, max value and min value.[4] The file count, file size (which can be easily get from file system) is not used in the planner now, we can improve this later.

/**
 * Extension of {@linkinput DecodingFormat}format which is able to report estimated statistics for FileSystem
 * connector.
 */
@PublicEvolving
public interface FileBasedStatisticsReportableDecodingFormat<I> extends DecodingFormat<I>FileBasedStatisticsReportableInputFormat {

    /**
     * Returns the estimated statistics of this {@linkinput DecodingFormat}format.
     *
     * @param files The files to be estimated.
     * @param producedDataType the final output type of the format.
     */
    TableStats reportStatistics(List<Path> files, DataType producedDataType);
}

...