Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Create a replica table based on the specified tag/snapshot of upstream and downstream Paimon Tables

  2. Resubmit all streaming jobs, incremental or full recovery starting from the specified offset

I think we We need to support branching in Paimon . Then for the above data correction progress, then we could create replica tables to avoid copying all data from specified tables and increase storage space.
Besides the above, branching in Paimon can also be used to enhance tag. for For Tag simulation of traditional Hive partition tables, provide data correction capabilities on the basis of Tag, which can be used to supplement data and achieve precise segmentation capabilities.
Above all, the branch we would like to introduce in Paimon has the following abilities:

...

We propose to provide two Flink actions for users to control the creating, deleting, merging and replacing of branches.

action

argument

note

create-branch

--name <branch-name>: specify the name of the branch.
-- tag <tag-name>: specify the name of a tag.

create a branch based on the given tag.

delete-branch

--name <branch-name>: specify which branch will be deleted.

delete a branch.

merge-branch

--name <branch-name>: merge specified branch to main.

merge specified branch to main.

replace-main

--name <branch-name>: replace main branch with specified branch.

replace the main branch with a specified branch.

Branch System Table

We propose introducing a system table $branches. The schema is:

Field Name

Field Type

Comment

name

string

The branch name

tag_name

string

The created tag for the branch

tagged_snapshot_id

bigint

The snapshot id for the tag.

Expiring Snapshot

We already had a mechanism to find deletion candidates when expiring snapshots. After adding branches for a Paimon table, it needs to traverse all branches to check whether a snapshot can be deleted.

...