Discussion thread | https://lists.apache.org/thread/q46clxx38fz7n1xw0sgscmcslo3qrp5c |
---|---|
Vote thread | https://lists.apache.org/thread/odzjgyk641jgfzq64s2h8h65ql8349sy |
ISSUE | https://github.com/apache/incubator-paimon/issues/1795 |
Release | Paimon-0.6TBD |
Motivation
In data streaming process there may be data errors and other issues, and we need to correct the data in the flow. This situation is very common and important. However, in this process, we do not want to affect existing data processing to avoid impact on users. We need to create a new data streaming process and wait for it to catch up with the data and replace the original data streaming process. The main operations can be divided into the following steps:
...
draw.io Diagram | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
There is a main branch file
in the branch base directory of table and it has the main branch name in the file. Besides that, there will be multiple branch directories and each branch has snapshot, tag and schema in its directory.
NOTICE: By default, the Snapshot、Schema and Tag of main branch will be in the base directory of table as previously. The main branch will be used to read and write when there's no specified branch or main branch file in the table.
Create Branch
There will be a series of snapshots, tags and schemas in the main branch of a Paimon table. We can create a new branch with branch name from the tag for the table. Paimon should create a new directory with the given branch name, copy the specified tag, snapshot and schema from the main branch to the new branch.
...