Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Is Hudi an analytical database? 

A typical database has a bunch of long running storage servers always running, which takes writes and reads. Hudi's architecture is very different and for good reasons. It's highly decoupled where writes and queries/reads can be scaled independently to be able to handle the scale challenges. So, it may not always seems like a database.

Nonetheless, Hudi is designed very much like a database and provides similar functionality (upserts, change capture) and semantics (transactional writes, snapshot isolated reads).<Answer WIP>

How do I model the data stored in Hudi? 

When writing data into Hudi, you model the records like how you would on a key-value store - specify a key field (unique for a single partition/across dataset), a partition field (denotes partition to place key into) and preCombine/combine logic that specifies how to handle duplicates in a batch of records written. This model enables Hudi to enforce primary key constraints like you would get on a database table. See here for an example.

When querying/reading data, Hudi just presents itself as a json-like hierarchical table, everyone is used to querying using Hive/Spark/Presto over Parquet/Json/Avro. <Answer WIP>

Does Hudi support cloud storage/object stores?

<Answer WIP>Yes. Hudi is able to provide its functionality above 

What versions of Hive/Spark/Hadoop are support by Hudi? 

...