...

A key design decision in Hudi was to avoid creating small files and always write properly sized files, trading off more time on ingest/writing to keep queries always efficient. Common approaches to writing very small files and then later stitching them together only solve for system scalability issues posed by small files and also let queries slow down by exposing small files to them anyway.

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	HUDI-26

will take this to the next level, by even collapsing smaller file groups into larger ones.

How do I use DeltaStreamer or Spark DataSource API to write to a Non-partitioned Hudi dataset ?

For writing to a non-partitioned Hudi dataset and perform hive table syncing, you need to set the below configurations:

hoodie.datasource.write.keygenerator.class=org.apache.hudi.NonpartitionedKeyGenerator

hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor

Contributing to FAQ

A good and usable FAQ should be community-driven and crowd source questions/thoughts across everyone.

...

Space shortcuts

Page tree

Versions Compared

Old Version 60

New Version 61

Key

How do I use DeltaStreamer or Spark DataSource API to write to a Non-partitioned Hudi dataset ?

Contributing to FAQ

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 60

New Version 61

Key

How do I use DeltaStreamer or Spark DataSource API to write to a Non-partitioned Hudi dataset ?

Contributing to FAQ