Geode Modularization Proposal

Background

Geode currently falls into the category of a monolithic application. It is running the risk that it is becoming more and more difficult to add functionality/features to the system without inadvertently having a domino effect on another aspect of the product. Over time features and functionality has become tightly coupled, which leads to complications when one would like to refactor the system or add new functionality.

With a lack of clearly decoupled extension points, one is left to modify the core of Geode to add new services. With no clear and consistent bootstrapping the startup dependencies are not clear. New services must not depend on different or conflicting implementations of common SPIs or different versions of common third party libraries. Without a single source of configuration management, extending configuration to a new model requires modification or extension of multiple services in Geode. Such constraints make extending Geode cumbersome.

End users deploying applications into Geode have clearly defined extension points but lack the isolation from the core that would allow their application to depend on competing SPI implementations and third party library versions. This situation requires coordination between the Geode platform owner and the application owner to make sure such conflicts don’t exist. When deploying multiple applications coordination must be made between all application owners.

While Geode is not an application server in terms of J2EE it does share many of the traits that define an application server. At the very core is the concept of controlled execution of user-provided logic. While Geode blurs this line by also being a storage engine this should not exclude it from the need for many of the design patterns of an application server. Such patterns solve many of the issues we face when extending and deploying into Geode.

Goals

The main goal of modularization is to have a loosely coupled system with high cohesion. What this means is that modules are not tightly coupled to other modules (interchangeability). All code inside of a single module is specifically there to complete a task.

Modularization of Geode need to attain the following goals:


Terminology

Module

A module is a component that performs a distinct function. Each module will define/implement a public interface that allows for the interchangeability with another component. Modules can be assembled into new modules differing in size, complexity, and functionality.

Application

End user deployable collection of classes and configuration specific to the end user’s data storage and manipulation goals. This includes region configuration, listeners, functions, and modules.

Class Loader Isolation

A method to isolate collections of classes related to a specific deployable unit, either application or module. It guarantees that third-party dependencies of different versions or SPI implementations may be loaded into a deployable unit without conflict with another deployable unit.

Dependency management

Dependency management is the management of logical relationships between different activities or modules. At the simplest case, it is the management of dependencies that a module’s implementation has on the libraries it uses. It is also the management of the relationship between different modules. Relationships are usually defined as "depends-on" or "has-dependants-of". Note: A module can never be both the dependency or dependent. This is called a cyclical dependency and should be avoided.

Dependency injection

Dependency injection is a form of Inversion of Control. It aids with the decoupling of modules in a manner that a dependent service is only injected into the module at runtime. Allowing for the injection of different modules satisfying the interface to be interchanged. I.e no tight coupling to services

Bootstrapping

Bootstrapping refers to a self-starting process that loads the basic software without external input. This process usually involves a chain of stages, in which each stage a smaller, simpler program loads and then executes the larger, more complicated program of the next stage.


Proposal

To facilitate extensibility within the Geode platform (referred to as platform) we propose first defining the isolation, dependency, configuration, bootstrapping, lifecycle, and deployment artifacts for new modules with the long term goal of breaking up parts of the existing monolithic application into smaller modules defined exactly the same way new modules will be.

Modules

A module is a self-contained unit or item. Each module will be composed of the following:

  1. Module application code

  2. Public module services and interfaces

  3. Dependent libraries (incl. version)

  4. Dependent module references (incl. version)

IMG_20170309_101343.jpg

The benefit for modularity is that each module can be interchanged with another implementing module, provided it adheres to the publically exposed interface. In addition to this, each module can be tested as a black box. This allows for better, more precise and targeted testing.

Isolation

Each module will be isolated from any other module by a lightweight container that provides class loader isolation. A module may contain all third party libraries it depends on or may call out dependencies on libraries provided commonly shipped with the platform, like Log4J or Commons Collections. Such dependencies will include a version of range versions that must be met to satisfy the external dependency to better avoid dependency version incompatibility. Two or more versions of the same third party module may be deployed. A module will also contain the classes that make up the core of the module’s functionality.

 IMG_20170309_102254.jpg

As can be seen in the diagram, ModuleA, ModuleB, and ModuleC each have different library dependencies and different library version dependencies. Due to the isolation, the different libraries or versions do not affect one another.

Dependencies

Not only will a module list implementation dependencies described by isolation but also dependencies of instantiation. If a module depends on the instance of another module this dependency will be described such that the bootstrapper can instantiate and inject at the appropriate time. Dependencies will be injected into the instance of the module.

Configuration

Each module shall expose a single authoritative view of its configuration. All configuration services within the platform shall manipulate the exposed configuration of the module and not require modification to the configuration services. This means that if module A is deployed then the XML config, cluster config, rest admin, JMX and GFSH interfaces should all expose the configuration of A without explicit modification to either of those services. The configuration should communicate instance dependencies within the global and application domains.

Bootstrapping

Bootstrapping will be handled by an external force and not be tightly coupled with any single service. This means the bootstrapper only knows about the existence of module and the configuration it shall be given but knows nothing of the details of that module or configurations. The bootstrapper also knows the dependencies of this module in so much as it knows to instantiate dependent modules before this module. To avoid conflicts and concurrency issues, bootstrapping will be done by a single thread.

Lifecycle

A module shall have lifecycle states as either stopped or started as seen from any other module. A module in a started state is guaranteed that all its dependencies are also in the started stated. If any dependency is being transitioned to the stopped state then the dependent module will be transitioned to a stopped state first. The lifecycle manager shall maintain the state of each module and its dependencies. A module must not signal completion of its started state transition until all internal state is in a consistent state such that a call to any method on the service provides an expected and determinate outcome.

Deployment Artifact

It is reasonable to assume that some differences will exist between deployment artifacts for Geode server provided artifacts and end user application artifacts but many of similarities will exist. A deployed artifact must contain a description of all other artifacts it depends on. These may be other deployment artifacts, core modules, jars, files, etc. This dependency descriptor will be used to manage class loader isolation and bootstrapping. It may also contain a configuration descriptor on how the components in the artifact are to be instantiated. This descriptor will facilitate bootstrapping and lifecycle management and may affect class loader isolation.



  • No labels