Page History

...

Hypothesis The `sliding data window` abstraction from Apache Beam (also present in Spark and Flink) can eliminate most (all ) of the ad-hoc attempts to handle incremental data inside analysis code.

(UC) Use Hudi to build file based data lakes that are self updating with arrival of new data (data fabric)

`Hudi` works with one `data set` at a time but when building a `data lake` we need to relate `data set`s logically.structurally and logically (business semantics) so that `feature store`s can be built from raw data

(UC) Use Hudi to build file based feature stores (data fabric)

The first kind is relational data but we also need graph, array and other forms of relations in data, ideally in an unified `data fabric`.

Resources about how Dremio relational cache works for inspiration on how `Hudi` might play in

Technologies on the radar

Apache Arrow
Dremio
Project DAWM Weld

Space shortcuts

Page tree

Versions Compared

Old Version 8

New Version 9

Key

(UC) Use Hudi to build file based data lakes that are self updating with arrival of new data (data fabric)

(UC) Use Hudi to build file based feature stores (data fabric)

Technologies on the radar