Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status

Current state: Under Discussion

...

Released: <Flink Version>

Motivation

We can introduce richer merge strategies, one of which is already introduced is PartialUpdateMergeFunction, which completes non-NULL fields when merging. We can introduce more powerful merge strategies, such as support for pre-aggregated merges. Currently the pre-aggregation is used by many big data systems, e.g. Apache Doris, Apache Kylin, Druid, to reduce storage cost and accelerate aggregation query. By introducing pre-aggregated merge to Flink table storeTable Store, it can acquire the same benefit.  Aggregate functions which we plan to  implement includes sum, max/min, count, replace_if_not_null, replace, concatenate, or/and.

Public Interfaces

Basic usage of pre-aggregated merge

...


Future Work
An advanced way of introducing pre-aggregated merge into Flink table store Table Store is using materialized view to get pre-aggregated merge result from a source table. Then a stream job is started to synchronize data, consume source data, and write incrementally . This data synchronization job has no state. More information is described in JIRA.

Proposed Changes

An ConfigOption<String> type variable named ‘AGGREGATE_FUNCTION’ is defined in CoreOptions.java to retrieve configuration of 'aggregate-function' in WITH clause.

Adding one more value named 'PRE_AGGREGATE'  to enum MergeEngine type in CoreOptions.java. It acts as one type of the merge-engines supported by Flink table storeTable Store.

In the constructor of ChangelogWithKeyFileStoreTable, using 'PRE_AGGREGATE' as one more case in switch-case to initialize merge-engine.

A subclass of MergeFunction named AggregateMergeFunction is created in AggregateMergeFunction.java to conduct pre-aggregated merge.

Compatibility, Deprecation, and Migration Plan

This is new feature, no compatibility, deprecation, and migration plan.

Test Plan

Each pre-aggregated merge function will be covered with IT tests.

Rejected Alternatives

None.