Does Hudi support cloud storage/object stores?

Yes. Generally speaking, Hudi is able to provide its functionality above on any Hadoop FileSystem implementation and thus can read and write datasets on Cloud stores (Amazon S3 or Microsoft Azure or Google Cloud Storage). Over time, Hudi has also incorporated specific design aspects that make building Hudi datasets on the cloud easy, such as consistency checks for s3, Zero moves/renames involved for data files.

What versions of Hive/Spark/Hadoop are support by Hudi?

<Answer WIP>As of September 2019, Hudi can support Spark 2.1+, Hive 2.x, Hadoop 2.7+ (not Hadoop 3)

How does Hudi actually store data inside a dataset?

<Answer WIP>At a high level, Hudi is based on MVCC design that writes data to versioned parquet/base files and log files that contain changes to the base file. All the files are stored under a partitioning scheme for the dataset, which closely resembles how Apache Hive tables are laid out on DFS. Please refer here for more details.

Using Hudi

What are some ways to write a Hudi dataset?

...

Space shortcuts

Page tree

Versions Compared

Old Version 41

New Version 42

Key

Does Hudi support cloud storage/object stores?

What versions of Hive/Spark/Hadoop are support by Hudi?

How does Hudi actually store data inside a dataset?

Using Hudi

What are some ways to write a Hudi dataset?

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 41

New Version 42

Key

Does Hudi support cloud storage/object stores?

What versions of Hive/Spark/Hadoop are support by Hudi?

How does Hudi actually store data inside a dataset?

Using Hudi

What are some ways to write a Hudi dataset?