Decouple Hudi related logic from existing HoodieParquetInputFormat, HoodieRealtimeInputFormat, HoodieRealtimeRecordReader, e.t.c
Create new classes to use org.apache.hadoop.mapreduce APIs and warp Hudi related logic into it.
Warp the FileInputFormat from the query engine to take advantage of the optimization. As Spark SQL for example, we can create a HoodieParquetFileFormat by wrapping ParquetFileFormat and ParquetRecordReader<Row> from Spark codebase with Hudi merging logic. And extend the support for OrcFileFormat in the future.

Implementation

Rollout/Adoption Plan

No impact on the existing users because the existing Hive related InputFormat won't be changed, except some methods was relocated to HoodieInputFormatUtils class. Will test this won't impact the Hive query.
New Spark Datasource support for Merge on Read table will be added

...