Design

Most of the configuration parameters and files are common for all the instances of the same OODT component. Therefore, following ZNode structure is adapted where configuration related to each individual component type is stored in a separate ZNode sub tree as shown below.

The Idea

On a high level, there can be multiple projects running different sets of file managers, workflow managers and etc. To store configuration for different projects separately in zookeeper, the concept of projects are introduced and the root ZNode is divided to sub trees based on the project. Under each project there are configuration stored for different OODT components (file manager, resource manager, ...). Basically, there are two types of configuration files that needs to be stored, properties files and other configuration files (like XML files and etc). Out of those, properties within the properties files are only loaded once (normally at the initialization). Therefore, they are treated as an special case in contrast to other configuration files. In order to achieve this, two sub trees are created as shown above for properties files and other configuration files for each component. If you take the file manager (file-mgr ZNode), it has 2 child nodes as mentioned to store properties files and other configuration files.

What are the other child nodes (etc, policy) in the ZNode structure?

This is where the design comes into play. Suppose we have a mime-types.xml file for our file manager. If you have configured file manager manually, you may have seen that there are many directories within OODT distribution (file-mgr, res-mgr, workflow and etc.) Within these directories (say within file-mgr directory) there are another set of directories like bin, lib, policy and etc. As you assume, bin and lib include all the libraries and executables. etc includes the major configuration files (filemgr.properties) and policy also includes several other files required for configuration purposes (ex: cmd-line-actions.xml). Therefore, when we do distributed configuration management, we have to make sure that all the instances which are downloading configuration from zookeeper will get all of these properties files and configuration files and they will store them in the correct directories for corresponding components to pickup these files in run time.

In order to make sure that all the configuration files will be available within a predefined directory at runtime, we store each configuration file in a ZNode whose path is same as the path where that file should be at runtime. If we take the mime-types.xml file, it should be available (locally) within the ${FILEMGR_HOME}/etc/ directory. Therefore, to identify where the corresponding configuration file should be stored locally relative to ${FILEMGR_HOME}, we take the ZNode path from ZNode file-mgr/configuration/ as the storing location. Therefore, when being downloaded, content within the ZNode oodt/components/file-mgr/configuration/etc/mime-types.xml will be stored in ${FILEMGR_HOME}/etc/mim-types.xml file. That is the basic idea of how configuration will be published and how they will downloaded and stored.

DistributedConfigurationPublisher is responsible for publishing configuration to zookeeper initially. Once configuration has been published, any OODT component running in any cluster node can fetch them through DistributedConfigurationManager class. A CLI tool is available to publish/verify/clear configuration in zookeeper. To learn more on configuration publishing, please read the documentation on Distributed Configuration Management.

Future Developments

Extending distributed configuration management to a distributed command framework

At the moment, even with distributed configuration enabled:
  1. We have to login to a remote server
  2. Install/unpack corresponding OODT component
  3. Start it (with no manual configuration since configuration is downloaded on the fly). We need to set ZK_CONNECT_STRING environment variable prior to that.
  4. If we need to restart a component, then we have to login to that server as well.

If we can extend our zookeeper based configuration management to a command framework, we can simply restart/refresh the entire component or the configuration as required with just a simple terminal command in a local machine.

Introducing distributed configuration management to crawler and pcs packages

As per the moment, distributed configuration management only support 3 main components of OODT, file manager, resource manager and workflow manager. It would be great if this feature was introduced to above mentioned packages as well.

Allow file manager clients to query multiple file managers as one

Currently for file storage and data archiving there would have to be an NFS mount and stuff. Once file managers are configured, they are not aware of the other file managers operate in the cluster. If we can allow the file managers to know about each other, then we can extend that to clients being able to query a range of file managers as if they were one.

  • No labels

1 Comment

  1. Great page Imesha. I'm looking forward to this new feature being epic! (smile)