Page History

...

Also, #BigData analytics seem to choose "files over databases".https://youtu.be/jvt4v2LTGK0?t=345

"Particle physics, 10,000 times faster" by Jim Pivarski
"Our data is deeply nested and cross linked"
https://youtu.be/jvt4v2LTGK0?t=455

Tip

title	Hudi keeps data in files

This is where Hudi`Hudi` comes into the picture by allowing data to be kept in files, not just input data but also output data.

So, when we see architectures that use "streaming pipelines" that read from files and write to databases we can tell that those architectures are not useful for this vision of "continuous deep analytics".

The initial Uber vision that lead to Hudi was published in https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop.
Seems to me that the ability to keep data in the basic files is key to building the above vision.

*Hypothesis* : the ability to keep data in the basic files is key to building the above vision.

This page about In Uber #Michelangelo https://eng.uber.com/michelangelo/ suggests there is still a distinction in the architecture , and implementation (and programming model) between "batch" and "streaming" and the data is placed in distinct kinds of "feature stores" physical repositories between batch and continuous analyses.

Activity

Here we will list a growing list of use cases that we find useful in the above context.

But this is an open invitation to others who share in interest in this `Continuous Deep Analytics` paradigm to contribute use cases, problems, needs, designs, ideas, code and in every way help further the visionThe thread of discussion that Vinoth and I have is about proving the initial vision with code, to see how far we can chew at this issue.

Space shortcuts

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Activity