...
Page properties |
---|
...
|
...
...
...
|
...
|
...
|
Motivation
We can introduce richer merge strategies, one of which is already introduced is PartialUpdateMergeFunction, which completes non-NULL fields when merging. We can introduce more powerful merge strategies, such as support for pre-aggregated merges. Currently the pre-aggregation is used by many big data systems, e.g. Apache Doris, Apache Kylin, Druid, to reduce storage cost and accelerate aggregation query. By introducing pre-aggregated merge to Flink Table Store, it can acquire the same benefit. Aggregate functions which we plan to implement includes sum, max/min, last_non_null_value, last_value, listagg, bool_or/bool_and.
Public Interfaces
Basic usage of pre-aggregated merge
...
Future work
An advanced way of introducing pre-aggregated merge into Flink Table Store is using materialized view to get pre-aggregated merge result from a source table. Then a stream job is started to synchronize data, consume source data, and write incrementally . This data synchronization job has no state. More information is described in JIRA.
Proposed Changes
An ConfigOption<String> type variable named ‘AGGREGATE_FUNCTION’ is defined in CoreOptions.java to retrieve configuration of 'aggregate-function' in WITH clause.
...
A subclass of MergeFunction named AggregateMergeFunction is created in AggregateMergeFunction.java to conduct pre-aggregated merge.
Compatibility, Deprecation, and Migration Plan
This is new feature, no compatibility, deprecation, and migration plan.
Test Plan
Each pre-aggregated merge function will be covered with IT tests.
Rejected Alternatives
None.