...
To further the initial vision from https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop through - to define "data processing" - through use cases, patterns of functionality, designs, applications, algorithms, code, etc. and by building on other technologies and state of the art research.
...
Hypothesis The `sliding data window` abstraction from Apache Beam (also present in Spark and Flink) can eliminate most (all ) of the ad-hoc attempts to handle incremental data inside analysis code.
(UC) Use Hudi to build file based data lakes that are incrementally self updating
...
(data fabric)
`Hudi` works with one `data set` at a time but when building a `data lake` we need to relate `data set`s structurally and logically (business semantics) so that `feature store`s can be built from raw data
...