...
draw.io Diagram | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
For example, when Branch-1
is created from tag-1
, it should copy the relevant snapshot-4
and schema-1
for Branch-1
. Branch-2
and Branch-3
will do the same thing for tag-7
and tag-11
.
Operations In Branch
After a branch is created, streaming and batch jobs can read and write data in it. Like a regular table, we can also streaming and batch data from branch through time travel. After writing data to the branch, new snapshots and tags will be created. Users can also perform DDL for table branches, such as add/drop/alter columns. For example, we do these operations in Branch-1
to create new schemas, snapshots and tags.
draw.io Diagram | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
After the above steps, the main branch will be replaced with the target branch and the existing jobs can still read and write data in the branch.
draw.io Diagram | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Proposed Changes
Branch Directory
To manage branches better, we would like to create a directory for each branch, like /branch-1
, /branch-2
, and /branch-n
. Snapshot, Tag and Schema directories will be placed in the branch directory. We introduce BranchManager
to each table to manage its branches.
Code Block |
---|
public class BranchManager {
/** Get the main branch name for the table. */
public String getMainBranch();
/** Create branch with given branch name from specified tag. */
public void createBranch(String tagName, String branchName);
/** Delete branch with given branch name. */
public void deleteBranch(String branchName);
/** Merge given branch into main and the branch will be still exist. */
public void mergeBranch(String branchName);
/**
* Replace main branch with specified branch name and the
* previous main branch will be deleted.
**/
public void replaceMain(String branchName);
} |
Query Branch
Users can set branch names in the job to stream and batch read and write data.
Spark
Code Block |
---|
// Query data from specified branch and tag name.
SELECT * FROM t VERSION AS OF branch-name.tag-name;
// Query data from specified branch and snapshot id.
SELECT * FROM t VERSION AS OF branch-name.snapshot; |
Users can specify the branch name in their jobs, and when there's no branch name in the version, the query will read the data in the main branch.
NOTICE: The branch name can not contain '.' which will be checked when a branch is created.
Flink
Code Block |
---|
SELECT * FROM t /*+ OPTIONS('scan.branch'='<branch name>') */ |
We will introduce a new option scan.branch
for flink to specify branch name in the job.
Flink Branch Actions
We propose to provide two Flink actions for users to control the creating, deleting, merging and replacing of branches.
action | argument | note |
create-branch | --name <branch-name>: specify the name of the branch. | create a branch based on the given tag. |
delete-branch | --name <branch-name>: specify which branch will be deleted. | delete a branch. |
merge-branch | --name <branch-name>: merge specified branch to main. | merge specified branch to main. |
replace-main | --name <branch-name>: replace main branch with specified branch. | replace the main branch with a specified branch. |
Branch System Table
We propose introducing a system table $branches
. The schema is:
Field Name | Field Type | Comment |
name | string | The branch name |
tag_name | string | The created tag for the branch |
tagged_snapshot_id | bigint | The snapshot id for the tag. |
Expiring Snapshot
We already had a mechanism to find deletion candidates when expiring snapshots. After adding branches for a Paimon table, it needs to traverse all branches to check whether a snapshot can be deleted.
Compatibility, Deprecation, and Migration Plan
Adding a branch leads to modifying the locations of Snapshot, Tag and Schema. To be compatible with previous versions of Paimon, we can use the name of the main branch as ''(empty string) when there is no main-branch-file, which can keep the original Snapshot, Tag and Schema locations unchanged. We need to test for this and add tests for branching.