Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Spark supports loading a CSV file into a dataset of rows (example).  Hudi has a RowSource that handles datasources with Rows.  The idea is to transform CSV to Row format using Spark's functionality by passing in counterpart CSV options and use existing logic to go from Rows.


Currently, the conversion from Row to Avro in Hudi is not efficient.  However, the improvement on this is on the way, so wrapping Spark's CSV reader makes the implementation easy and extensible compared to parsing CSV and constructing the record from scratch.

Implementation

Implementing a CSVSource by extending RowSource (similar to JDBCSource in this PR)

...