Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To further the initial vision from https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop through  - to define "data processing" - through use cases, patterns of functionality, designs, applications, algorithms, code, etc. and by building on other technologies and state of the art research.

...

Hypothesis The `sliding data window` abstraction from Apache Beam (also present in Spark and Flink) can eliminate most (all (question)) of the ad-hoc attempts to handle incremental data inside analysis code.

(UC) Use Hudi to build file based data lakes that are incrementally self updating

...

(data fabric)

`Hudi` works with one `data set` at a time but when building a `data lake` we need to relate `data set`s structurally and logically (business semantics) so that `feature store`s can be built from raw data

...