Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

if Hive Sync is enabled in the deltastreamer tool or datasource, the dataset is available in Hive as a couple of tables, that can now be read using HiveQL, Presto or SparkSQL. See here for more.

...

Hudi provides built in support for rewriting your entire dataset into Hudi one-time using the HDFSParquetImporter tool available from the hudi-cli . You could also do this via a simple read and write of the dataset using the Spark datasource APIs. Once migrated, writes can be performed using normal means discussed here. This topic is discussed in detail here, including only doing partial migrations.

How can I pass hudi configurations to my spark job?

...

Can I register my Hudi dataset with Apache Hive metastore?

<Answer WIP>Yes. This can be performed either via the standalone Hive Sync tool or using options in  deltastreamer tool or datasource

How does the Hudi indexing work & what are its benefits? 

...

What's Hudi's schema evolution story?

<Answer WIP>

...

How do I run compaction for a MOR dataset?

Simplest way to cio



Performance 

...