In this page hierarchy, we explain the concepts, design and the overall architectural underpinnings of Apache Hudi. This content is intended to be the technical documentation of the project and will be kept up-to date with

Info

title	def: annotations

In an effort to keep this page crisp for reading, any concepts that we need to explain are marked with a def: annotation and hyperlinked off. You can contribute immensely to our docs, by writing the missing pages for annotated terms

Introduction

Apache Hudi (Hudi for short, here on) allows you to store vast amounts of data, on top existing def:hadoop-compatible-storage, while providing two primitives, that enable def:stream-processing on def:data-lakes, in addition to typical def:batch-processing.

Specifically,

Update/Delete Records : Hudi provides support for updating/deleting records, using fine grained file/record level indexes, while providing transactional guarantees for the write operation.
Change Streams : Hudi also provides first-class support for obtaining an incremental stream of change records i.e all the records that were updated/inserted/deleted in a given dataset. , from a given point-in-time

Unlocking such stream/incremental processing capabilities on these def:DFS abstractions, has several advantages.

...

Space shortcuts

Page tree

Versions Compared

Old Version 28

New Version 29

Key

Introduction

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 28

New Version 29

Key

Introduction