Presently, a rename/delete operation can become prohibitively expensive for such directories which have large sub-trees/sub-paths. Ozone does rename/delete each and every sub-file & sub-dir under the given directory via multiple RPC calls to OM thus makes it very expensive. Also, rename and delete doesn't guarantee the atomicity.
The prefix based FileSystem optimization idea allows to perform rename, delete of any directory in a deterministic/constant time atomically. Now, ozone performs rename/delete operations in a single RPC call by sending only the given directory to OM. It will finish rename, delete operations with O(1) complexity. Also, makes it possible to support atomic rename/delete of any directory at any level in the namespace.
Git Branch:
Implementing this in a separate feature HDDS-2939 branch. Thanks to all contributors/reviewers.
How to enable prefix based optimization feature:
Following are the set of configurations to be configured in 'ozone-default.xml' to enable this feature. By default the feature will be turned OFF.
An example of ozone-site.xml
<configuration> <property> <name>ozone.om.enable.filesystem.paths</name> <value>true</value> </property> <property> <name>ozone.om.metadata.layout</name> <value>prefix</value> </property> </configuration>
Related documents
Branch merge checklist
1. builds/intermittent test failures
TODO
2. documentation
TODO
3. design, attached the docs
TODO
4. s3 compatibility
TODO
5. docker-compose / acceptance tests
TODO
6. support of containers / Kubernetes:
TODO
7. coverage/code quality:
TODO
8. build time
TODO
9. possible incompatible changes/used feature flag:
TODO
10. third party dependencies/licence changes:
TODO
11. performance
Done testing to evaluate the performance of delete, rename operations in feature branch vs master code base. Following charts capturing the directory delete and rename operations execution time shows that, feature branch has a very significant performance gain compared to the master.
Ran freon 'dtsg' dfs tree generator benchmark test in a single node cluster. V0 represents master code and V1 represents feature branch. Please refer to the Jira document for more details.
12. security considerations
TODO