Proposers

Raymond Xu

Approvers

@<approver1 JIRA username> Vinoth Chandar : [APPROVED/REQUESTED_INFO/REJECTED]
@<approver2 JIRA username> Balaji Varadarajan : [APPROVED/REQUESTED_INFO/REJECTED]
...

...

To snapshot is to get the most up-to-date records from a Hudi dataset at the query time. Note that this could take longer for MOR tables as it involves merging the latest log files.

Arguments

	Description	Remark
--source-base-path	Base path for the source Hudi dataset to be snapshotted	required
--target-base-path	Base path for the target output files (snapshots)	required
--snapshot-prefix	Snapshot prefix or directory under the target base path in order to segregate different snapshots	optional; may default to provide a daily prefix at run time like `2019/11/12/`
--output-format	"HUDI_COPY", "PARQUET"	required; When "HUDI_COPY", behaves the same as `HoodieSnapshotCopier` ; may support more data formats in the future
--output-partition-field	A field to be used by Spark repartitioning	optional; Ignored when "HUDI_COPY"
--output-partitioner	A class to facilitate custom repartitioning	optional; Ignored when "HUDI_COPY"

Steps

Gliffy Diagram


name	RFC-9 snapshotter overview
pagePin	1

...

Space shortcuts

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Proposers

Approvers

Arguments

Steps

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Proposers

Approvers

Arguments

Steps