Info

title	Contributing

To contribute content to this FAQ, see here.

General

When is Hudi a useful for me or my organization

...

For an insert or bulk_insert operation, no such pre-combining is performed. Thus, if your input contains duplicates, the dataset would also contain duplicates. If you don't want duplicate records either issue an upsert or consider specifying option to de-duplicate input in either datasource or deltastreamer.

Can I implement my own logic for how input records are merged with record on storage

...

Hudi provides built in support for rewriting your entire dataset into Hudi one-time using the HDFSParquetImporter tool available from the hudi-cli . You could also do this via a simple read and write of the dataset using the Spark datasource APIs. Once migrated, writes can be performed using normal means discussed here. This topic is discussed in detail here, including ways to doing partial migrations.

...

Space shortcuts

Page tree

Versions Compared

Old Version 67

New Version 68

Key

General

When is Hudi a useful for me or my organization

Can I implement my own logic for how input records are merged with record on storage

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 67

New Version 68

Key

General

When is Hudi a useful for me or my organization

Can I implement my own logic for how input records are merged with record on storage