Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Decouple Hudi related logic from existing HoodieParquetInputFormat, HoodieRealtimeInputFormat, HoodieRealtimeRecordReader, e.t.c
  • Create new classes to use org.apache.hadoop.mapreduce APIs and warp Hudi related logic into it. 
  • Warp the FileInputFormat from the query engine to take advantage of the optimization. As Spark SQL for example, we can create a HoodieParquetFileFormat by wrapping ParquetFileFormat and ParquetRecordReader<Row> from Spark codebase with Hudi merging logic. And extend the support for OrcFileFormat in the future.

Image RemovedImage Added


Implementation

https://github.com/apache/incubator-hudi/pull/1592

Image Added

Rollout/Adoption Plan

...