Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Test is done to measure time take to write & read 'replace' metadata using code here. Here is the result:


Partitions

Total FileGroups replaced

(divide by column1 to get number of file groups per partition)

Serialization cost (millis)

Deserialization cost (millis)

Memory utilization

(HoodieReplaceMetadata object size + serialized byte[] size in memory )

1

300

55

41

60KB

1

3,000

55

42

570KB

1

30,000

93

120

5.7MB

1

300,000

103

130

57MB

10

300

53

32

60KB

10

3,000 

68

52

574KB

10

30,000 

87

104

5.7MB

10

300,000

97

114

57MB

We plan to store this metadata similar to clean metadata in avro files. After consolidated metadata is launched, we can come up with a plan to migrate this to leverage consolidated metadata(This will likely reduce memory required for cases where a partition has large number of files replaced)

...