Git branch: https://github.com/apache/ozone/tree/HDDS-3630
Currently there will be one RocksDB for each Container on datanode, which leads to hundreds of thousands of RocksDB instances on one datanode. It's very challenging to manage this amount of RocksDB instances in one JVM. Please refer to the "problem statement" section of the design document[1] for challenge details. Unlike the current approach, Datanode RocksDB merge feature will use only one RocksDB for each data volume. With far fewer RocksDB instances to manage, the write path performance and DN stability are improved, Refer to the Micro Benchmark Data section of the design document.
To enable the feature, the following configs need to be added to Ozone Manager's ozone-site.xml
.
<property> <name>hdds.datanode.container.schema.v3.enabled</name> <value>false</value> </property> <property> <name>Hdds.datanode.container.db.dir</name> <description>Determines where the per-disk rocksdb instances will be stored. This setting is optional. If unspecified, then rocksdb instances are stored on the same disk as HDDS data. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for storage policies. The default storage type will be DISK if the directory does not have a storage typetagged explicitly. Ideally, this should be mapped to a fast disk like an SSD.</description> </property> <property> <name>Hdds.datanode.failed.db.volumes.tolerated</name> <value>-1</value> <description>The number of db volumes that are allowed to fail before a datanode stops offering service. Default -1 means unlimited, but we should have at least one good volume left.</description> </property>
1. Builds/intermittent test failures
No additional flaky tests have been introduced by the feature branch.
2. Documentation
Documentation is being added by
.3. Design, attached the docs
The design docs can be found under the Attachments section in the umbrella jira:
4. Compatibility
Merge RocksDB in datanode feature does not change any existing datanode API. All container data with the existing Ozone cluster will remain their current format and can always be accessible after the datanode upgrade.
5. Docker-compose / acceptance tests
New acceptance test is being added by Jira:
6. Support of containers / Kubernetes:
No addition.
7. Coverage/code quality:
Current feature branch coverage is 85.0% (vs 82.3 % of master branch)
8. Build time
No significant build time difference has been observed.
master branch succeeded 3 days ago in 9m 9s: https://github.com/apache/ozone/runs/6528138840?check_suite_focus=true
Feature branch succeeded 7 days ago in
9. Possible incompatible changes/used feature flag:
There should not be any incompatible changes introduced with this feature.
A global enable/disable switch for the this feature is added in
.10. Third party dependencies/license changes:
There is no third party dependencies introduced by this feature.
11. Performance
We have tested major datanode activities which require RocksDB operation, include container create & close & delete, and block put & get. Except that container delete performance drops because container metadata KV need to be deleted from RocksDB, other four major activities all have performance improved.