Motivation
Currently, paimon has very high support for stream write and stream read, but not enough for traditional batch processing. After the table is created, you need to display the specified bucket key and bucket number; otherwise, the AppendOnly table or changelog table for a single bucket is created. When there is only one bucket, concurrent read/write and compact cannot be performed, resulting in poor batch performance.
...
UPDATE Orders SET order_type = 'banana' WHERE order_type = 'apple' AND `date` > TO_TIMESTAMP('2020-02-02', 'yyyy-HH-dd’);
Public Interfaces
Only the table property 'write-mode' = 'table' is exported to public.
Proposed Changes
Batch Mode Table
At the paimon project level, we need a new table, a new write mode. In this mode:
...
After each delete update operation, a new snapshot is generated.
Compatibility, Deprecation, and Migration Plan
None
Test Plan
UT tests: verify all the component works, including: compaction coordinator, compaction worker, etc
IT tests: verify logic. Such as, whether the dag is correct, does the compaction works correctly in a flink job
Rejected Alternatives
- Still
...
- put
...
- compaction
...
- in
...
- writers,
...
- but
...
- only
...
- one
...
- writer
...
- could
...
- trigger
...
- compaction
...
- at
...
- a
...
- time.
...
- (rejected:
...
- it
...
- will
...
- slow
...
- down
...
- the
...
- inserting.
...
- compaction
...
- writer
...
- will
...
- run
...
- in
...
- a
...
- poor
...
- performance.)
- Start
...
- another
...
- compaction
...
- process
...
- to
...
- trigger
...
- compaction.
...
- (rejected:
...
- it's
...
- a
...
- waste
...
- of
...
- resource.)