Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this page hierarchy, we explain the concepts, design and the overall architectural underpinnings of Apache Hudi. This content is intended to be the technical documentation of the project and will be kept up-to date with 

Info
titledef: annotations

In an effort to keep this page crisp for reading, any concepts that we need to explain are marked with a def: annotation and hyperlinked off. You can contribute immensely to our docs, by writing the missing pages for annotated terms

Introduction

Apache Hudi (Hudi for short, here on) allows you to store vast amounts of data, on top existing def:hadoop-compatible-storage, while providing two primitives, that enable def:stream-processing on def:data-lakes, in addition to typical def:batch-processing.

Specifically,

  • Update/Delete Records : Hudi provides support for updating/deleting records, using fine grained file/record level indexes, while providing transactional guarantees for the write operation. 
  • Change Streams : Hudi also provides first-class support for obtaining an incremental stream of change records i.e all the records that were updated/inserted/deleted in a given dataset, from a given point-in-time

Unlocking such stream/incremental processing capabilities on these def:DFS abstractions, has several advantages.

...