Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
firstline1
titleSavepoint
linenumberstrue
public class Savepoint {
	/** Used for identify a savepoint. */
	private final long id;
	
	/** Which snapshot does this savepoint save. */
	private final long savedSnapshotId;

	/** Id of the current table schema of the records in this savepoint. */
	private final long schemaId;

	/** The manifest list of all data of this savepoint. */
	private final long fullManifestList;

	/** How many records in this savepoint. */
	private final long recordCount;

	/** Getters. */
	
	/** Some util methods. */

	/** Return all {@link ManifestFileMeta} instances for full data manifests in this snapshot. */
 	public List<ManifestFileMeta> dataManifests(ManifestList manifestList);
	
	/** Serialize to json. */
	public String toJson();

	/** Deserialize from json. */
	public static Savepoint fromJson(String json);

	/** Get a savepoint from path. */
	public static Savepoint fromPath(FileIO fileIO, Path path);
}

...

In the future, after Flink support stored procedure, we can provide procedures to managing savepoints.

Interaction with Snapshot

Currently, when snapshot expires, Paimon will check whether the data files are used by living snapshots. If not, they will be deleted. After we introduce savepoint, we should also check if the data files are used by savepoints. 

We should place `SavepointManager` to `FileStoreExpireImpl` to provide information about which data files are used.

System Table

We suppose to introduce a system table `SavepointsTable`. The schema is:

Code Block
languagesql
firstline1
savepoint_id BIGINT
saved_snapshot_id BIGINT
schema_id BIGINT
save_time BIGINT
record_count BIGINT 

...

We'd like to explain more details about where savepoints take effect.

...

Time Travel

...

to Savepoint (Batch Reading)

Currently, we support time traveling to a snapshot. Now we can support to travel to a savepoint.

For spark, we can use SQL The spark SQL is `SELECT * FROM t VERSION AS OF <version-string>`. Currently, the version string only supports a long value which represents snapshot. Now we suppose to use `snapshot.<id>` to specify traveling to snapshot or use `savepoint.<id>` to specify traveling to savepoint. The old style `<id>` will be considered as snapshot id. 

For Flink, we can use dynamic option `scan.snapshot-id=<snapshot-id>`. We suppose to introduce a new core option `scan.savepoint-id` to support traveling to a savepoint.

...

Time Travel to Timestamp

...

with Savepoint

Currently, we support time traveling to a time point. Paimon Scanner will find the snapshot of which commit time is closest to the specified time point. The problem is all of the snapshots before the time point may have expired. Now So we suppose to support try to find the closest savepoint after Paimon Scanner finds no snapshot.

Flink Streaming Reading from Snapshot/Savepoint

Currently, we support to specify start snapshot in streaming reading by dynamic option `scan.snapshot-id=<snapshot-id>`. In case that 

Flink Streaming Reading from timestamp

...

in this scenario.

Compatibility, Deprecation, and Migration Plan

None.

Test Plan

UT tests: verify creating and reading savepoint

IT tests: verify savepoint related logic, including:

  1. time travel to savepoint
  2. expiration of snapshots won't delete data files pointed by savepoints

Rejected Alternatives

Support starting from a savepoint in streaming reading

Current design of savepoint just store full data manifests, so it's not able to support streaming reading now. 

 

...