Everything is a log : Hudi also has an append-only, cloud data storage friendly design, that lets Hudi manage data on across all the major cloud providers seamlessly, implementing principles from def~log-structured-storage systems.

key-value data model : On the writer side, Hudi table is modeled as a key-value dataset, where each def~record has a unique def~record-key. Additionally, a record key may also include the def~partitionpath under which the record is partitioned and stored. This often helps in cutting down the search space during index lookups.

Table Layout

With an understanding of key technical motivations for the projects, let's now dive deeper into design of the system itself. At a high level, components for writing Hudi tables are embedded into an Apache Spark job using one of the supported ways and it produces a set of files on def~backing-dfs-storage, that represents a Hudi def~table. Query engines like Apache Spark, Presto, Apache Hive can then query the table, with certain guarantees (that will discuss below).

...

Ordered sequence of def~timeline-metadata about all the write operations done on the table, akin to a database transaction log.
Set A hierarchical layout of a set of def~data-files that actually contain the records that were written to the table.
An def~index (which could be implemented in many ways), that maps a given record to a subset of the data-files that contains the record.

...

upsert() support with fast, pluggable indexing
Incremental queries that scan only new data efficiently
Atomically publish data with rollback support, Savepoints for data recovery
Snapshot isolation between writer & queries using def~mvcc style design
Manages file sizes, layout using statistics
Self managed def~compaction of updates/deltas against existing records.
Timeline metadata to audit changes to data
GDPR, Data deletions, Compliance.

Timeline

Excerpt Include

	def~timeline
	def~timeline
nopanel	true

Data Files

Table Types

The implementation specifics of the two def~table-types are detailed below.

Excerpt Include

	def~tabledef~timeline-typeinstantdef~table
	def~timeline-typeinstant
nopanel	true

Copy On Write Table

def~copy-on-write

Excerpt Include

	def~copydef~instant-on-writeactiondef~copy
	def~instant-on-writeaction
nopanel	true

Image Removed

...

def~merge-on-read

Excerpt Include

	def~mergedef~instant-on-readstatedef~merge
	def~instant-on-readstate
nopanel	true

Image Removed

Hudi writing is implemented as a Spark library, which makes it easy to integrate into existing data pipelines or ingestion libraries (which we will refer to as `Hudi clients`). Hudi Clients prepare an `RDD[HoodieRecord]` that contains the data to be upserted and Hudi upsert/insert is merely a Spark DAG, that can be broken into two big pieces.

...

Data Files

Hudi organizes a datasets table into a directory folder structure under a `basepath` on a def~table-basepath on DFS. Dataset is broken up into partitionsIf the table is partitioned by some columns, then there are additional def~table-partitions under the base path, which are folders containing data files for that partition, very similar to Hive tables. Each partition is uniquely identified by its `partitionpath`its def~partitionpath, which is relative to the basepath.
Within Within each partition, files are organized into `file groups` def~file-groups, uniquely identified by a `file id`a def~file-id. Each file group contains several `file slices`several def~file-slices, where each slice contains a base columnar def~base-file (`*.parquet`e.g: parquet) produced at a certain commit/compaction instant def~instant-time, along with set of log files (`*.log.*`) that def~log-files that contain inserts/updates to the base file since the base file was producedlast written. Hudi adopts a MVCC design, where compaction action merges logs and base files to produce new file slices and cleaning action gets rid of unused/older file slices to reclaim space on DFS.Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file group, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.

Image AddedFig : Shows four file groups 1,2,3,4 with base and log files, with few file slices each

Index

Excerpt Include

	def~index
	def~index
nopanel	true

Writing

Write Operations

...

Table Types

The implementation specifics of the two def~table-types are detailed below.

Excerpt Include

	def~table-type
	def~table-type
nopanel	true

Copy On Write Table

def~copy-on-write (COW)

Excerpt Include

	def~copy-on-write (COW)
	def~copy-on-write (COW)
nopanel	true

Image Added

Merge On Read Table

def~merge-on-read (MOR)

Excerpt Include

	def~merge-on-read (MOR)
	def~merge-on-read (MOR)
nopanel	true

Image Added

Writing

Write Operations

Excerpt Include

	def~write-operation
	def~write-operation
nopanel	true

Compaction

Excerpt Include

	def~compaction
	def~compaction
nopanel	true

Cleaning

Excerpt Include

	def~cleaning
	def~cleaning
nopanel	true

Optimized DFS Access

Compaction

<WIP>

Cleaning

<WIP>

...

Hudi also performs several key storage management functions on the data stored in a Hudi dataset def~table. A key aspect of storing data on DFS is managing file sizes and counts and reclaiming storage space. For e.g HDFS is infamous for its handling of small files, which exerts memory/RPC pressure on the Name Node and can potentially destabilize the entire cluster. In general, query engines provide much better performance on adequately sized columnar files, since they can effectively amortize cost of obtaining column statistics etc. Even on some cloud data stores, there is often cost to listing directories with large number of small files.

Here are some ways to , Hudi writing efficiently manage manages the storage of your Hudi datasetsdata.
The [ small file handling feature](configurations.html#compactionSmallFileSize) in feature in Hudi, profiles incoming workload and distributes inserts to existing file groups def~file-group instead of creating new file groups, which can lead to small files.Cleaner can be [configured](configurations.html#retainCommits) to clean up older file slices, more or less aggressively depending on maximum time for queries to run & lookback needed for incremental pull
Employing a cache of the def~timeline, in the writer such that as long as the spark cluster is not spun up everytime, subsequent def~write-operations never list DFS directly to obtain list of def~file-slices in a given def~table-partition
User can also tune the size of the [base/parquet file](configurations.html#limitFileSize), [log files](configurations.html#logFileMaxSize) & expected [compression ratio](configurations.html#parquetCompressionRatio) def~base-file as a fraction of def~log-files & expected compression ratio, such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
Intelligently tuning the [ bulk insert parallelism](configurations.html#withBulkInsertParallelism), can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups once created cannot be deleted, but simply expanded as explained before.
For workloads with heavy updates, the [merge-on-read storage](concepts.html#merge-on-read-storage) provides a nice mechanism for ingesting quickly into smaller files and then later merging them into larger base files via compaction.

Querying

<WIP>

Snapshot Queries

<WIP>

Incremental Queries

<WIP>

Read Optimized Queries

<WIP>

Hive Integration

Querying

Excerpt Include

	def~query-type
	def~query-type
nopanel	true

Snapshot Queries

Excerpt Include

	def~snapshot-query
	def~snapshot-query
nopanel	true

Incremental Queries

Excerpt Include

	def~incremental-query
	def~incremental-query
nopanel	true

Read Optimized Queries

Excerpt Include

	def~read-optimized-query
	def~read-optimized-query
nopanel	true

<wip>

Space shortcuts

Page tree

Versions Compared

Old Version 45

New Version Current

Key

Table Layout

Timeline

Data Files

Table Types

Copy On Write Table

Data Files

Index

Writing

Write Operations

Table Types

Copy On Write Table

Merge On Read Table

Writing

Write Operations

Compaction

Cleaning

Optimized DFS Access

Compaction

Cleaning

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries

Hive Integration

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 45

New Version Current

Key

Table Layout

Timeline

Data Files

Table Types

Copy On Write Table

Data Files

Index

Writing

Write Operations

Table Types

Copy On Write Table

Merge On Read Table

Writing

Write Operations

Compaction

Cleaning

Optimized DFS Access

Compaction

Cleaning

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries

Hive Integration

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries