...
Hudi has custom input format implementation to work with Hive tables. These classes are also affected by the change in the package namespace. In addition, these input format names are renamed to note that they work primarily on Parquet dataset.
Please find the relocation details name changes below
View Type | Pre v0.5.0 Input Format Class | v0.5.0 Input Format Class |
Read Optimized View | com.uber.hoodie.hadoop.HoodieInputFormat | org.apache.hudi.hadoop.HoodieInputFormatHoodieParquetInputFormat |
Realtime View | com.uber.hoodie.hadoop.HoodieRealtimeInputFormat | org.apache.hudi.hadoop.realtime.HoodieRealtimeInputFormatHoodieParquetRealtimeInputFormat |
Changes in Spark DataSource Format Name:
With the package renaming, Hudi’s Spark Data Source will now be accessed for reading and writing using the format name “org.apache.hudi”
Data Source Type | Pre v0.5.0 Format (e.g in scala) | v0.5.0 Format (e.g in scala) |
Read | spark.read.format(“com.uber.hoodie”).xxxx | spark.read.format(“org.apache.hudi”).xxxx |
Write | spark.write.format(“com.uber.hoodie”).xxxx | spark.write.format(“org.apache.hudi”).xxxx |
Migrating Existing Hudi Datasets:
...