Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Motivation

In data streaming process there may be data errors and other issues, and we need to correct the data in the flow. This situation is very common and important. However, in this process, we do not want to affect existing data processing to avoid impact on users. We need to create a new data streaming process and wait for it to catch up with the data and replace the original data streaming process. The main operations can be divided into the following steps:

...

draw.io Diagram
bordertrue
diagramName2
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth591
revision12

There is a main branch file  in the branch base directory of table and it has the main branch name in the file. Besides that, there will be multiple branch directories and each branch has snapshot, tag and schema in its directory.

NOTICE: By default, the Snapshot、Schema and Tag of main branch will be in the base directory of table as previously. The main branch will be used to read and write when there's no specified branch or main branch file in the table.

Create Branch

There will be a series of snapshots, tags and schemas in the main branch of a Paimon table. We can create a new branch with branch name from the tag for the table. Paimon should create a new directory with the given branch name, copy the specified tag, snapshot and schema from the main branch to the new branch.

...

action

argument

note

create-branch

--name <branch-name>: specify the name of the branch.
-- tag <tag-name>: specify the name of a tag.

create a branch based on the given tag.

delete-branch

--name <branch-name>: specify which branch will be deleted.

delete a branch.

merge-branch

--name <branch-name>: merge specified branch to main.

merge specified branch to main.

replace-main-branch

--name <branch-name>: replace main branch with specified branch.

replace the main branch with a specified branch.

...