Excerpt Include

	def~table-type
	def~table-type
nopanel	true

Copy On Write Table

def~copy-on-write (COW)

Excerpt Include

	def~copy-on-write (COW)
	def~copy-on-write (COW)
nopanel	true

Merge On Read Table

def~merge-on-read (MOR)

Excerpt Include

	def~merge-on-read (MOR)
	def~merge-on-read (MOR)
nopanel	true

Writing

Write Operations

...

The small file handling feature in Hudi, profiles incoming workload and distributes inserts to existing def~file-group instead of creating new file groups, which can lead to small files.
Employing a cache of the def~timeline, in the writer such that as long as the spark cluster is not spun up everytime, subsequent def~write-operations never list DFS directly to obtain list of def~file-slices in a given def~table-partition
User can also tune the size of the def~base-file as a fraction of def~log-files & expected compression ratio, such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
Intelligently tuning the bulk insert parallelism, can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups once created cannot be deleted, but simply expanded as explained before.

Querying

Excerpt Include

	def~query-type
	def~query-type
nopanel	true

<WIP>

Snapshot Queries

Excerpt Include

	def~snapshot-query
	def~snapshot-query
nopanel	true

<WIP>

Incremental Queries

Excerpt Include

	def~incremental-query
	def~incremental-query
nopanel	true

<WIP>

Read Optimized Queries

Excerpt Include

	def~read-optimized-query
	def~read-optimized-query
nopanel	true

<WIP>

Space shortcuts

Page tree

Versions Compared

Old Version 57

New Version Current

Key

Copy On Write Table

Merge On Read Table

Writing

Write Operations

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 57

New Version Current

Key

Copy On Write Table

Merge On Read Table

Writing

Write Operations

Querying

Snapshot Queries

Incremental Queries

Read Optimized Queries