Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Motivation

Currently, paimon has very high support for stream write and stream read, but not enough for traditional batch processing. After the table is created, you need to display the specified bucket key and bucket number; otherwise, the AppendOnly table or changelog table for a single bucket is created. When there is only one bucket, concurrent read/write and compact cannot be performed, resulting in poor batch performance.

...

UPDATE Orders SET order_type = 'banana' WHERE order_type = 'apple' AND `date` > TO_TIMESTAMP('2020-02-02', 'yyyy-HH-dd’);


Public Interfaces

Only the table property 'write-mode' = 'table' is exported to public.


Proposed Changes

Batch Mode Table

At the paimon project level, we need a new table, a new write mode. In this mode: 

...

After each delete update operation, a new snapshot is generated.

Compatibility, Deprecation, and Migration Plan

None

Test Plan

UT tests: verify all the component works, including: compaction coordinator, compaction worker, etc

IT tests: verify logic. Such as, whether the dag is correct, does the compaction works correctly in a flink job

Rejected Alternatives

  • Still

...

  • put

...

  • compaction

...

  • in

...

  • writers,

...

  • but

...

  • only

...

  • one

...

  • writer

...

  • could

...

  • trigger

...

  • compaction

...

  • at

...

  • a

...

  • time.

...

  • (rejected:

...

  • it

...

  • will

...

  • slow

...

  • down

...

  • the

...

  • inserting.

...

  • compaction

...

  • writer

...

  • will

...

  • run

...

  • in

...

  • a

...

  • poor

...

  • performance.)
  • Start

...

  • another

...

  • compaction

...

  • process

...

  • to

...

  • trigger

...

  • compaction.

...

  • (rejected:

...

  • it's

...

  • a

...

  • waste

...

  • of

...

  • resource.)