Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A key goal of Hudi is to provide upsert functionality that is orders of magnitude faster than rewriting entire tables or partitions. 

...

Code Block
val hoodieROView = spark.read.format("org.apache.hudi").load(basePath + "/path/to/partitions/*")
val hoodieIncViewDF = spark.read().format("org.apache.hudi")
     .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(), DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
     .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), <beginInstantTime>)
     .load(basePath);


Info
titleLimitations

Note that currently the reading realtime view natively out of the Spark datasource is not supported. Please use the Hive path below


if Hive Sync is enabled in the deltastreamer tool or datasource, the dataset is available in Hive as a couple of tables, that can now be read using HiveQL, Presto or SparkSQL. See here for more.

...

How does Hudi handle duplicate record keys in an input? 

<Answer WIP>When issuing an `upsert` operation on a dataset and the batch of records provided contains multiple entries for a given key, then 

Can I implement my own logic for how input records are merged with record on storage? 

...