Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For tables or partitions, where the majority of records change every cycle, it is inefficient to do upsert or merge.  We want to provide hive like 'insert overwrite' API to ignore all the existing data and create a commit with just new data provided.  Doing this in Hoodie will provide better snapshot isolation than Hive because of atomic commits. These API can also be used for certain operational tasks to fix a specific corrupted partition. We can do 'insert overwrite'  on that partition with records from the source. This can be much faster than restore and replay for some data sources.

Background

<Introduce any much background context which is relevant or necessary to understand the feature and design choices.>

Implementation


Hoodie supports multiple write operations such as insert, upsert, bulk_insert on the target table.  At a high level, we like to add two new operations:

...