Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

draw.io Diagram
bordertrue
diagramName3
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth841
revision1

For example, when Branch-1  is created from tag-1 , it should copy the relevant snapshot-4  and schema-1  for Branch-1 . Branch-2  and Branch-3  will do the same thing for tag-7  and tag-11 .

Operations In Branch

After a branch is created, streaming and batch jobs can read and write data in it. Like a regular table, we can also streaming and batch data from branch through time travel. After writing data to the branch, new snapshots and tags will be created. Users can also perform DDL for table branches, such as add/drop/alter columns. For example, we do these operations in Branch-1  to create new schemas, snapshots and tags.

draw.io Diagram
bordertrue
diagramName4
simpleViewerfalse
width
linksauto
tbstyletop
lboxtrue
diagramWidth811
revision1

...

After the above steps, the main branch will be replaced with the target branch and the existing jobs can still read and write data in the branch.

draw.io Diagram
bordertrue
diagramName6
simpleViewerfalse
width600
linksauto
tbstyletop
lboxtrue
diagramWidth871
revision1

Proposed Changes

Branch Directory

To manage branches better, we would like to create a directory for each branch, like /branch-1, /branch-2, and /branch-n. Snapshot, Tag and Schema directories will be placed in the branch directory. We introduce BranchManager to each table to manage its branches.

Code Block
public class BranchManager {
    /** Get the main branch name for the table. */
    public String getMainBranch();
    
    /** Create branch with given branch name from specified tag. */
    public void createBranch(String tagName, String branchName);
    
    /** Delete branch with given branch name. */
    public void deleteBranch(String branchName);
    
    /** Merge given branch into main and the branch will be still exist. */
    public void mergeBranch(String branchName);
    
    /**
     * Replace main branch with specified branch name and the 
     * previous main branch will be deleted.
     **/
    public void replaceMain(String branchName);
}

Query Branch

Users can set branch names in the job to stream and batch read and write data.

Spark

Code Block
// Query data from specified branch and tag name.
SELECT * FROM t VERSION AS OF branch-name.tag-name;

// Query data from specified branch and snapshot id.
SELECT * FROM t VERSION AS OF branch-name.snapshot;

Users can specify the branch name in their jobs, and when there's no branch name in the version, the query will read the data in the main branch.
NOTICE: The branch name can not contain '.' which will be checked when a branch is created.


Flink

Code Block
SELECT * FROM t /*+ OPTIONS('scan.branch'='<branch name>') */

We will introduce a new option scan.branch for flink to specify branch name in the job.

Flink Branch Actions

We propose to provide two Flink actions for users to control the creating, deleting, merging and replacing of branches.

action

argument

note

create-branch

--name <branch-name>: specify the name of the branch.
-- tag <tag-name>: specify the name of a tag.

create a branch based on the given tag.

delete-branch

--name <branch-name>: specify which branch will be deleted.

delete a branch.

merge-branch

--name <branch-name>: merge specified branch to main.

merge specified branch to main.

replace-main

--name <branch-name>: replace main branch with specified branch.

replace the main branch with a specified branch.

Branch System Table

We propose introducing a system table $branches. The schema is:

Field Name

Field Type

Comment

name

string

The branch name

tag_name

string

The created tag for the branch

tagged_snapshot_id

bigint

The snapshot id for the tag.

Expiring Snapshot

We already had a mechanism to find deletion candidates when expiring snapshots. After adding branches for a Paimon table, it needs to traverse all branches to check whether a snapshot can be deleted.

Compatibility, Deprecation, and Migration Plan

Adding a branch leads to modifying the locations of Snapshot, Tag and Schema. To be compatible with previous versions of Paimon, we can use the name of the main branch as ''(empty string) when there is no main-branch-file, which can keep the original Snapshot, Tag and Schema locations unchanged. We need to test for this and add tests for branching.