...
Test is done to measure time take to write & read 'replace' metadata using code here. Here is the result:
Partitions | Total FileGroups replaced (divide by column1 to get number of file groups per partition) | Serialization cost (millis) | Deserialization cost (millis) | Memory utilization (HoodieReplaceMetadata object size + serialized byte[] size in memory ) |
1 | 300 | 55 | 41 | 60KB |
1 | 3,000 | 55 | 42 | 570KB |
1 | 30,000 | 93 | 120 | 5.7MB |
1 | 300,000 | 103 | 130 | 57MB |
10 | 300 | 53 | 32 | 60KB |
10 | 3,000 | 68 | 52 | 574KB |
10 | 30,000 | 87 | 104 | 5.7MB |
10 | 300,000 | 97 | 114 | 57MB |
We plan to store this metadata similar to clean metadata in avro files. After consolidated metadata is launched, we can come up with a plan to migrate this to leverage consolidated metadata(This will likely reduce memory required for cases where a partition has large number of files replaced)
...