Git branch: https://github.com/apache/ozone/tree/HDDS-3630
Currently there will be one RocksDB for each Container on datanode, which leads to hundreds of thousands of RocksDB instances on one datanode. It's very challenge to manage this amount of RocksDB instances in one JVM. Please refer to the "problem statement" section of the design document[1] for detail challenges. Different than current approach, Datanode RocksDB merge feature will use only one RocksDB for each data volume. With far less RocksDB instance to manage, the write path performance and DN stability is improvement, Refer to the Micro Benchmark Data section of design document[1].
For more, please check out the full documentation. The doc has feature overview, setup guide, CLI guide and access control guide (best viewed locally rendered using hugo serve
command under ./hadoop-hdds/docs/
, as it is not published to the website yet).
To enable the feature, the following configs need to be added to Ozone Manager's ozone-site.xml
.
<property> <name>hdds.datanode.container.schema.v3.enabled</name> <value>false</value> </property> <property> <name>Hdds.datanode.container.db.dir</name> <description>Determines where the per-disk rocksdb instances will be stored. This setting is optional. If unspecified, then rocksdb instances are stored on the same disk as HDDS data. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for storage policies. The default storage type will be DISK if the directory does not have a storage typetagged explicitly. Ideally, this should be mapped to a fast disk like an SSD.</description> </property> <property> <name>Hdds.datanode.failed.db.volumes.tolerated</name> <value>-1</value> <description>The number of db volumes that are allowed to fail before a datanode stops offering service. Default -1 means unlimited, but we should have at least one good volume left.</description> </property>
1. Builds/intermittent test failures
No additional flaky tests have been introduced by the feature branch.
2. Documentation
Documentation has been added since
and is under constant revision.The doc (S3-Multi-Tenancy.md, S3-Tenant-Commands.md and so on) can be found under https://github.com/apache/ozone/tree/HDDS-4944/hadoop-hdds/docs/content/feature
3. Design, attached the docs
The design docs can be found under the Attachments section in the umbrella jira:
4. Compatibility
Merge RocksDB in datanode feature does not change any existing datanode API. All container data with the existing Ozone cluster will remain their current format and can always be accessible after the datanode upgrade.
5. Docker-compose / acceptance tests
No addition.
6. Support of containers / Kubernetes:
No addition.
7. Coverage/code quality:
Current feature branch coverage is 85.0% (vs 82.3 % of master branch)
8. Build time
No significant build time difference has been observed.
master branch succeeded 3 days ago in 9m 9s: https://github.com/apache/ozone/runs/6528138840?check_suite_focus=true
Feature branch succeeded 7 days ago in
9. Possible incompatible changes/used feature flag:
There should not be any incompatible changes introduced with this feature.
A global enable/disable switch for the this feature is added in
.10. Third party dependencies/license changes:
There is no third party dependencies introduced by this feature.
11. Performance
We have tested major datanode activities which require RocksDB operation, include container create & close & delete, and block put & get. Except that container delete performance drops because container metadata KV need to be deleted from RocksDB, other four major activities all have performance improved.