Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We can introduce richer merge strategies, one of which is already introduced is PartialUpdateMergeFunction, which completes non-NULL fields when merging. We can introduce more powerful merge strategies, such as support for pre-aggregated merges. Currently the pre-aggregation is used by many big data systems, e.g. Apache Doris, Apache Kylin, Druid, to reduce storage cost and accelerate aggregation query. By introducing pre-aggregated merge to Flink Table Store, it can acquire the same benefit.  Aggregate functions which we plan to  implement includes sum sum, max/min, count, replacelast_ifnon_notnull_nullvalue, replace, concatenate, last_value,  listagg, bool_or/bool_and..

Public Interfaces

Basic usage of pre-aggregated merge

...

The max/min aggregate function fupports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE, INTERVAL(INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND), DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ data types.

The count/ last_non_null_value/last_value aggregate functions support all data types.

...

The bool_and/bool_or aggregate function supports BOOLEAN data type.

Default value of these aggregate functions

sum: default value of corresponding data type

count: 0

max/min/last_non_null_value/last_value: NULL

listagg: ""

bool_and: true

bool_or: false


Changelog support

In most cases, the modification to Table Store is INSERT changes. However, Table Store can also be converted into retract stream which may include retract messages (UPDATE/DELETE changes).

...

Aggregate functions supporting for UPDATE changes: sum, count.

Aggregate functions supporting for DELETE changes: sum, count.

It needs more design to make other aggregate functions support UPDATE/DELETE changes.

...