...
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
public class Savepoint {
/** Used for identify a savepoint. */
private final long id;
/** Which snapshot does this savepoint save. */
private final long savedSnapshotId;
/** Id of the current table schema of the records in this savepoint. */
private final long schemaId;
/** The manifest list of all data of this savepoint. */
private final long fullManifestList;
/** How many records in this savepoint. */
private final long recordCount;
/** Getters. */
/** Some util methods. */
/** Return all {@link ManifestFileMeta} instances for full data manifests in this snapshot. */
public List<ManifestFileMeta> dataManifests(ManifestList manifestList);
/** Serialize to json. */
public String toJson();
/** Deserialize from json. */
public static Savepoint fromJson(String json);
/** Get a savepoint from path. */
public static Savepoint fromPath(FileIO fileIO, Path path);
} |
...
In the future, after Flink support stored procedure, we can provide procedures to managing savepoints.
Interaction with Snapshot
Currently, when snapshot expires, Paimon will check whether the data files are used by living snapshots. If not, they will be deleted. After we introduce savepoint, we should also check if the data files are used by savepoints.
We should place `SavepointManager` to `FileStoreExpireImpl` to provide information about which data files are used.
System Table
We suppose to introduce a system table `SavepointsTable`. The schema is:
Code Block | ||||
---|---|---|---|---|
| ||||
savepoint_id BIGINT
saved_snapshot_id BIGINT
schema_id BIGINT
save_time BIGINT
record_count BIGINT |
...
We'd like to explain more details about where savepoints take effect.
...
Time Travel
...
to Savepoint (Batch Reading)
Currently, we support time traveling to a snapshot. Now we can support to travel to a savepoint.
For spark, we can use SQL The spark SQL is `SELECT * FROM t VERSION AS OF <version-string>`. Currently, the version string only supports a long value which represents snapshot. Now we suppose to use `snapshot.<id>` to specify traveling to snapshot or use `savepoint.<id>` to specify traveling to savepoint. The old style `<id>` will be considered as snapshot id.
For Flink, we can use dynamic option `scan.snapshot-id=<snapshot-id>`. We suppose to introduce a new core option `scan.savepoint-id` to support traveling to a savepoint.
...
Time Travel to Timestamp
...
with Savepoint
Currently, we support time traveling to a time point. Paimon Scanner will find the snapshot of which commit time is closest to the specified time point. The problem is all of the snapshots before the time point may have expired. Now So we suppose to support try to find the closest savepoint after Paimon Scanner finds no snapshot.
Flink Streaming Reading from Snapshot/Savepoint
Currently, we support to specify start snapshot in streaming reading by dynamic option `scan.snapshot-id=<snapshot-id>`. In case that
Flink Streaming Reading from timestamp
...
in this scenario.
Compatibility, Deprecation, and Migration Plan
None.
Test Plan
UT tests: verify creating and reading savepoint
IT tests: verify savepoint related logic, including:
- time travel to savepoint
- expiration of snapshots won't delete data files pointed by savepoints
Rejected Alternatives
Support starting from a savepoint in streaming reading
Current design of savepoint just store full data manifests, so it's not able to support streaming reading now.
...