Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, we support time traveling to a time point. Paimon Scanner will find the snapshot of which commit time is closest to the specified time. The problem is all of the snapshots before the time point may have expired. So we suppose to try to find the closest savepoint in this scenario.

Usage

Periodically savepoints

Public Interfaces

Table Options 

keyRequiresTypenote

savepoint.create-time

falseStringThe time point to create savepoint periodically. Must be the format 'hh:mm'.
savepoint.create-intervalfalseDurationInterval between two savepoints. At least 1d and must be intergers of days.

Flink Actions

Now, 

...

  1. Spark supports extension to extend SQL syntax. We can provide savepoint procedures after we support CALL statement for Spark.
  2. Supporting CALL is in the road map of Flink. We can provide savepoint procedures after then. 

Proposed Changes

Storage

Like snapshot, a new directory `/savepoint` will be created under table directory for storing savepoints. The qualified path for a savepoint file is `/path/to/table/savepoint/savepoint-<id>`.

Creation & Deletion

SQL Syntax

Flink Actions

Currently, we can provide two ways for user to control the creation and deletion of savepoints.

...

In the future, after Flink support stored procedure, we can provide procedures to managing savepoints.

System Table

We suppose to introduce a system table `SavepointsTable`. The schema is:

Code Block
languagesql
firstline1
savepoint_id BIGINT
schema_id BIGINT
save_time BIGINT
record_count BIGINT 

Data Files Handling

Creating Tag

Deleting Tag

Expiring Snapshot

Interaction with Snapshot (Deprecate)

  1. When creating savepoint, we will do a full-compact and pick the compacted snapshot to save.
  2. When expiring snapshot, Paimon will check whether the data files are used by living snapshots. If not, they will be deleted. After we introduce savepoint, we should also check if the data files are used by savepoints. 
  3. When deleting savepoint, we will check and delete unused datafiles (like we expire snapshot).

...

Future Work

Table options

We suppose to introduce a system table `SavepointsTable`. The schema is:

...

languagesql
firstline1

...

Call Procedures (Future work)

  1. Spark supports extension to extend SQL syntax. We can provide savepoint procedures after we support CALL statement for Spark.
  2. Supporting CALL is in the road map of Flink. We can provide savepoint procedures after then. 

Compatibility, Deprecation, and Migration Plan

...

UT tests: verify creating and reading savepointtags

IT tests: verify savepoint tag related logic, including:

  1. time travel to savepointtag
  2. expiration of snapshots won't delete data files pointed by savepointsused by tags
  3. delete tags can delete unused data files correctly

Rejected Alternatives

Use name `Savepoint`


Support starting from a

...

tag in streaming reading

Current design of savepoint tag just store full data manifests, so it's not able to support streaming reading now.