THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Proposers
Approvers
- @<approver1 JIRA username> Vinoth Chandar : [APPROVED/REQUESTED_INFO/REJECTED]
- @<approver2 JIRA username> Balaji Varadarajan : [APPROVED/REQUESTED_INFO/REJECTED]
- ...
...
To snapshot is to get the most up-to-date records from a Hudi dataset at the query time. Note that this could take longer for MOR tables as it involves merging the latest log files.
Arguments
Description | Remark | |
---|---|---|
--source-base-path | Base path for the source Hudi dataset to be snapshotted | required |
--target-base-path | Base path for the target output files (snapshots) | required |
--snapshot-prefix | Snapshot prefix or directory under the target base path in order to segregate different snapshots | optional; may default to provide a daily prefix at run time like 2019/11/12/ |
--output-format | "HUDI_COPY", "PARQUET" | required; When "HUDI_COPY", behaves the same as HoodieSnapshotCopier ; may support more data formats in the future |
--output-partition-field | A field to be used by Spark repartitioning | optional; Ignored when "HUDI_COPY" |
--output-partitioner | A class to facilitate custom repartitioning | optional; Ignored when "HUDI_COPY" |
Steps
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
...