Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Most of the configuration parameters and files are common for all the instances of the same OODT component. Therefore, following ZNode structure is adapted where configuration related to each individual component type is stored in a separate ZNode sub tree as shown below.

Image RemovedImage Added

The

...

Idea

On a high level, there can be multiple projects running different sets of file managers, workflow managers and etc. To store configuration for different projects separately in zookeeper, the concept of projects are introduced and the root ZNode is divided to sub trees based on the project. Under each project there are configuration stored for different OODT components (file manager, resource manager, ...). Basically, there are two types of configuration files that needs to be stored, properties files and other configuration files (like XML files and etc). Out of those, properties within the properties files are only loaded once (normally at the initialization). Therefore, they are treated as an special case in contrast to other configuration files. In order to achieve this, two sub trees are created as shown above for properties files and other configuration files for each component. If you take the file manager (file-mgr ZNode), it has 2 child nodes as mentioned to store properties files and other configuration files.

...

In order to make sure that all the configuration files will be available within a predefined directory at runtime, we store each configuration file in a ZNode whose path is same as the path where that file should be at runtime. If we take the mime-types.xml file, it should be available (locally) within the ${FILEMGR_HOME}/etc/ directory. Therefore, to identify where the corresponding configuration file should be stored locally relative to ${FILEMGR_HOME}, we take the ZNode path from ZNode fileZNode file-mgr/configuration/ as the storing location. Therefore, when being downloaded, content within the ZNode oodtZNode oodt/components/file-mgr/configuration/etc/mime-types.xml will be stored in ${FILEMGR_HOME}/etc/mim-types.xml file. That is the basic idea of how configuration will be published and how they will downloaded and stored.

DistributedConfigurationPublisher is responsible for publishing configuration to zookeeper initially. A CLI tool will be created on top of that class in near future. Once configuration has been published, any OODT component running in any cluster node can fetch them through DistributedConfigurationManager class. A CLI tool is available to publish/verify/clear configuration in zookeeper. To learn more on configuration publishing, please read the documentation on Distributed Configuration Management.

Future Developments

Extending distributed configuration management to a distributed command framework

At the moment, even with distributed configuration enabled:
  1. We have to login to a remote server
  2. Install/unpack corresponding OODT component
  3. Start it (with no manual configuration since configuration is downloaded on the fly). We need to set ZK_CONNECT_STRING environment variable prior to that.
  4. If we need to restart a component, then we have to login to that server as well.

If we can extend our zookeeper based configuration management to a command framework, we can simply restart/refresh the entire component or the configuration as required with just a simple terminal command in a local machine.

Introducing distributed configuration management to crawler and pcs packages

As per the moment, distributed configuration management only support 3 main components of OODT, file manager, resource manager and workflow manager. It would be great if this feature was introduced to above mentioned packages as well.

Allow file manager clients to query multiple file managers as one

Currently for file storage and data archiving there would have to be an NFS mount and stuff. Once file managers are configured, they are not aware of the other file managers operate in the cluster. If we can allow the file managers to know about each other, then we can extend that to clients being able to query a range of file managers as if they were one.