...
(UC) Integrate Hudi with Apache Beam so that the sliding data window abstractions of beam can run on top of Parquet files incrementally updated through `Hudi`
source: https://qcon.ai/system/files/presentation-slides/simplifying_ml_workflows_with_apache_beam.pdf
Hypothesis The `sliding data window` abstraction from Apache Beam (also present in Spark and Flink) can eliminate most (all ) of the ad-hoc attempts to handle incremental data inside analysis code.
(UC) Use Hudi to build file based data lakes that are self updating with arrival of new data
`Hudi` works with one `data set` at a time but when building a `data lake` we need to relate `data set`s logically.
The first kind is relational data but we also need graph, array and other forms of relations in data.