Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We might need a way to standardize the way users think about querying hudi tables using Spark/Hive/Presto. At the moment, I'm proposing another VIEW_TYPE to be introduced in Spark POINT_IN_TIME which will work in conjunction with an already present config END_INSTANTTIME_OPT_KEY to provide the snapshot view of a table at a particular instant in time (similar to select * from table where _hoodie_commit_time <= timeAsOf (commit_time))

Caveats/Open Items

  • The number of versions to keep should match the number of commits the client wants to travel. Need a way to enforce this.
  • Proposed approach pushed the client to perform some work and enforces some limitations
    • Can only time-travel based on the commit times of a hudi dataset. The clients have to figure out a way to map the timestamp they want to travel against the commit time that matches closes to it.
    • Clients have to get a list of the valid timestamps (hudi commit times) to time travel against 

...