Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Hudi cleaner process often runs right after a commit and deltacommit and goes about deleting old files that are no longer needed. If you are using the incremental pull feature, then ensure you configure the cleaner to retain sufficient amount of last commits to rewind. Another consideration is to provide sufficient time for your long running jobs to finish running. Otherwise, the cleaner could delete a file that is being or could be read by the job and will fail the job. Typically, the default configuration of 24 10 allows for an ingestion running every 30 mins to retain up-to 12 5 hours worth of data. If you run ingestion more frequently or if you want to give more running time for a query, consider increasing the  value for the config : hoodie.cleaner.commits.retained

What's Hudi's schema evolution story

...