Currently, we support time traveling to a time point. Paimon Scanner will find the snapshot of which commit time is closest to the specified time. The problem is all of the snapshots before the time point may have expired. So we suppose to try to find the closest savepoint in this scenario.

Usage

Periodically savepoints

Public Interfaces

Table Options

key	Requires	Type	note
savepoint.create-time	false	String	The time point to create savepoint periodically. Must be the format 'hh:mm'.
savepoint.create-interval	false	Duration	Interval between two savepoints. At least 1d and must be intergers of days.

Flink Actions

Now,

...

Spark supports extension to extend SQL syntax. We can provide savepoint procedures after we support CALL statement for Spark.
Supporting CALL is in the road map of Flink. We can provide savepoint procedures after then.

Proposed Changes

Storage

Like snapshot, a new directory `/savepoint` will be created under table directory for storing savepoints. The qualified path for a savepoint file is `/path/to/table/savepoint/savepoint-<id>`.

Creation & Deletion

SQL Syntax

Flink Actions

Currently, we can provide two ways for user to control the creation and deletion of savepoints.

...

In the future, after Flink support stored procedure, we can provide procedures to managing savepoints.

System Table

We suppose to introduce a system table `SavepointsTable`. The schema is:

Code Block

language	sql
firstline	1

savepoint_id BIGINT
schema_id BIGINT
save_time BIGINT
record_count BIGINT

Data Files Handling

Creating Tag

Deleting Tag

Expiring Snapshot

Interaction with Snapshot (Deprecate)

When creating savepoint, we will do a full-compact and pick the compacted snapshot to save.
When expiring snapshot, Paimon will check whether the data files are used by living snapshots. If not, they will be deleted. After we introduce savepoint, we should also check if the data files are used by savepoints.
When deleting savepoint, we will check and delete unused datafiles (like we expire snapshot).

...

Future Work

Table options

We suppose to introduce a system table `SavepointsTable`. The schema is:

...

language	sql
firstline	1

...

Call Procedures (Future work)

Spark supports extension to extend SQL syntax. We can provide savepoint procedures after we support CALL statement for Spark.
Supporting CALL is in the road map of Flink. We can provide savepoint procedures after then.

Compatibility, Deprecation, and Migration Plan

...

UT tests: verify creating and reading savepointtags

IT tests: verify savepoint tag related logic, including:

time travel to savepointtag
expiration of snapshots won't delete data files pointed by savepointsused by tags
delete tags can delete unused data files correctly

Rejected Alternatives

Use name `Savepoint`

Support starting from a

...

tag in streaming reading

Current design of savepoint tag just store full data manifests, so it's not able to support streaming reading now.

Page tree

Versions Compared

Old Version 8

New Version 9

Key

Usage

Public Interfaces

Table Options

Flink Actions

Proposed Changes

Storage

Creation & Deletion

SQL Syntax

Flink Actions

System Table

Data Files Handling

Creating Tag

Deleting Tag

Expiring Snapshot

Interaction with Snapshot (Deprecate)

Future Work

Call Procedures (Future work)

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Use name `Savepoint`

Support starting from a

tag in streaming reading

Page tree

Page History

Versions Compared

Old Version 8

New Version 9

Key

Usage

Public Interfaces

Table Options

Flink Actions

Proposed Changes

Storage

Creation & Deletion

SQL Syntax

Flink Actions

System Table

Data Files Handling

Creating Tag

Deleting Tag

Expiring Snapshot

Interaction with Snapshot (Deprecate)

Future Work

Call Procedures (Future work)

Compatibility, Deprecation, and Migration Plan

Rejected Alternatives

Use name `Savepoint`

Support starting from a

tag in streaming reading