Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: update the configuration options

...

5. Public interfaces and User Cases

Three Several new configuration options are added:

  • state.checkpoints.file-merging: A configuration option that allows the user to enable or disable the file merging feature.
  • state.checkpoints.file-merging.across-checkpoint-boundary: A configuration option that enables merging of files within a checkpoint or across multiple checkpoints (as discussed in section 4.1.1).
  • state.checkpoints.file-merging.max-file-size: A configuration option that sets a maximum size limit for physical files.
  • state.checkpoints.file-merging.max-file-pool-size: A configuration option that specifies the upper limit of the file pool size for concurrent writing.
  • state.checkpoints.file-merging.max-subtasks-per-file:  A configuration option that specifies the lower limit of the file pool size based on the number of subtasks within each TM (only for merging at TM level).
  • state.checkpoints.file-merging.max-space-amplification: A threshold that triggers a compaction (re-uploading) of files (as discussed in section 4.7).

Both forward and backward compatibility are supported, and these options only affect new files, not old ones. If the user enables the merging feature and restores a job from an old checkpoint, the new files will be merged while the old checkpoints remain separate. Conversely, if the user disables the feature and restores the job from a checkpoint with merged files, the new files will remain separate while the old files remain merged. The reason for keeping old files untouched is to save significant data transfer over the network, as file copying is unnecessary.

...