Page History

...

To further the initial vision from https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop through - to define "data processing" - through use cases, patterns of functionality, designs, applications, algorithms, code, etc. and by building on other technologies and state of the art research.

...

Hypothesis The `sliding data window` abstraction from Apache Beam (also present in Spark and Flink) can eliminate most (all ) of the ad-hoc attempts to handle incremental data inside analysis code.

(UC) Use Hudi to build file based data lakes that are incrementally self updating

...

(data fabric)

`Hudi` works with one `data set` at a time but when building a `data lake` we need to relate `data set`s structurally and logically (business semantics) so that `feature store`s can be built from raw data

...

Space shortcuts

Page tree

Versions Compared

Old Version 14

New Version 15

Key

(UC) Use Hudi to build file based data lakes that are incrementally self updating

(data fabric)