Page History

...

A key goal of Hudi is to provide upsert functionality that is orders of magnitude faster than rewriting entire tables or partitions.

...

Code Block

val hoodieROView = spark.read.format("org.apache.hudi").load(basePath + "/path/to/partitions/*")
val hoodieIncViewDF = spark.read().format("org.apache.hudi")
     .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(), DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
     .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>)
     .load(basePath);

Info

title	Limitations

Note that currently the reading realtime view natively out of the Spark datasource is not supported. Please use the Hive path below

if Hive Sync is enabled in the deltastreamer tool or datasource, the dataset is available in Hive as a couple of tables, that can now be read using HiveQL, Presto or SparkSQL. See here for more.

...

How does Hudi handle duplicate record keys in an input?

<Answer WIP>When issuing an `upsert` operation on a dataset and the batch of records provided contains multiple entries for a given key, then

Can I implement my own logic for how input records are merged with record on storage?

...

Space shortcuts

Page tree

Versions Compared

Old Version 43

New Version 44

Key

How does Hudi handle duplicate record keys in an input?

Can I implement my own logic for how input records are merged with record on storage?