if Hive Sync is enabled in the deltastreamer tool or datasource, the dataset is available in Hive as a couple of tables, that can now be read using HiveQL, Presto or SparkSQL. See here for more.

Hudi provides built in support for rewriting your entire dataset into Hudi one-time using the HDFSParquetImporter tool available from the hudi-cli . You could also do this via a simple read and write of the dataset using the Spark datasource APIs. Once migrated, writes can be performed using normal means discussed here. This topic is discussed in detail here, including only doing partial migrations.

How can I pass hudi configurations to my spark job?

...

Can I register my Hudi dataset with Apache Hive metastore?

<Answer WIP>Yes. This can be performed either via the standalone Hive Sync tool or using options in deltastreamer tool or datasource.

How does the Hudi indexing work & what are its benefits?

...

What's Hudi's schema evolution story?

...

How do I run compaction for a MOR dataset?

Simplest way to cio

Performance

...

Space shortcuts

Page tree

Versions Compared

Old Version 46

New Version 47

Key

How can I pass hudi configurations to my spark job?

Can I register my Hudi dataset with Apache Hive metastore?

How does the Hudi indexing work & what are its benefits?

What's Hudi's schema evolution story?

How do I run compaction for a MOR dataset?

Performance

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 46

New Version 47

Key

How can I pass hudi configurations to my spark job?

Can I register my Hudi dataset with Apache Hive metastore?

How does the Hudi indexing work & what are its benefits?

What's Hudi's schema evolution story?

How do I run compaction for a MOR dataset?

Performance