THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Copy On Write Table
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Merge On Read Table
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Writing
Write Operations
...
- The small file handling feature in Hudi, profiles incoming workload and distributes inserts to existing def~file-group instead of creating new file groups, which can lead to small files.
- Employing a cache of the def~timeline, in the writer such that as long as the spark cluster is not spun up everytime, subsequent def~write-operations never list DFS directly to obtain list of def~file-slices in a given def~table-partition
- User can also tune the size of the def~base-file as a fraction of def~log-files & expected compression ratio, such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
- Intelligently tuning the bulk insert parallelism, can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups once created cannot be deleted, but simply expanded as explained before.
Querying
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Snapshot Queries
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Incremental Queries
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Read Optimized Queries
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|