Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hudi has custom input format implementation to work with Hive tables. These classes are also affected by the change in the package namespace.  In addition, these input format names are renamed to note that they work primarily on Parquet dataset.

Please find the relocation details name changes below


View Type

Pre v0.5.0 Input Format Class

v0.5.0 Input Format Class

Read Optimized View

com.uber.hoodie.hadoop.HoodieInputFormat

org.apache.hudi.hadoop.HoodieInputFormatHoodieParquetInputFormat

Realtime View

com.uber.hoodie.hadoop.HoodieRealtimeInputFormat

org.apache.hudi.hadoop.realtime.HoodieRealtimeInputFormatHoodieParquetRealtimeInputFormat

Changes in Spark DataSource Format Name:

With the package renaming, Hudi’s Spark Data Source will now be accessed for reading and writing using the format name “org.apache.hudi”

Data Source Type

Pre v0.5.0 Format (e.g in scala)

v0.5.0 Format (e.g in scala)

Read

spark.read.format(“com.uber.hoodie”).xxxx

spark.read.format(“org.apache.hudi”).xxxx

Write

spark.write.format(“com.uber.hoodie”).xxxx

spark.write.format(“org.apache.hudi”).xxxx

Migrating Existing Hudi Datasets:

...