...
A key design decision in Hudi was to avoid creating small files and always write properly sized files, trading off more time on ingest/writing to keep queries always efficient. Common approaches to writing very small files and then later stitching them together only solve for system scalability issues posed by small files and also let queries slow down by exposing small files to them anyway.
...
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
How do I use DeltaStreamer or Spark DataSource API to write to a Non-partitioned Hudi dataset ?
For writing to a non-partitioned Hudi dataset and perform hive table syncing, you need to set the below configurations:
hoodie.datasource.write.keygenerator.class=org.apache.hudi.NonpartitionedKeyGenerator
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
Contributing to FAQ
A good and usable FAQ should be community-driven and crowd source questions/thoughts across everyone.
...