Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Karaf clustering requirements

Here is the list of requirements as they have been expressed so far in the discussions in the mailing lists. In order to keep them as close as possible to the actual user requirements, they are expressed more like user stories>

  • As a user I want to have multiple karaf instances running on different hosts and cooperating with each other, with minimal configuration effort.
    • Minimal / zero configuration implies autodiscovery, however the discovery mechanism should not be limited to multicast as its not allowed to Cloud environments (EC2 etc).
    • To avoid the management overhead configuration and provisioning on each node need be able to be applied to other nodes, but without having exact "clones" of the same instance.
    • To be able to organize the cluster,groups of nodes could share a "profile" that will encapsulate the clustering configuration and through which the role of each node could be specified.
    • It would be nice to have to have provisioning tools for the "profiles" so that profiles can be create/edited and assigned to nodes.
  • I want to be able to change the configuration on all or some just of the nodes, without having to do so in every single node.
    • We need to implement a mechanism through which nodes of the same "profile" can share the same configuration.
  • I want to be able to add a feature repository on all or just a group of nodes, without having to do so in every single node.
    • We need to implement a mechanism through which nodes of the same "profile" can share the same feature repositories
    • We need to implement a mechanism through which nodes of the same "profile" can share the same feature states.
  • I want to be able to install/start/stop a bundle on all or just a group of nodes, without having to do so in every single node.
    • This somehow overlaps with the feature sharing, however since the use of features is optional, it would be nice if the same functionality could be provided for plain bundles.

Scenarios

Realistic scenarios as a basis for requirements and to reflect the design against.

Simple fault tolerance and scalability

To achieve fault tolerance and load balancing there are may exist one or more slaves of a server. The slaves should reflect the configuration state of the main server. So changes only need to be made on the main server and are replicated to the slaves. The load needs to be distributed over the existing servers. The distribution may be part of the Karaf solution or may be done by external hardware. The running applications can be splitted into stateless and statefull. In the stateless case each request may be processsed by each server. In the statefull case requests from one source need to either all go to the same node again or the state needs to be replicated on the nodes. A key question here is if statefull processing is supported by the clustering solution or not.

Management and Provisioning for a Network of servers

Management of several applications that are each deployed to several servers. Each server may host one or more instances of Karaf and fullfills a certain role.
For example a typical application is deployed to three types of servers: Webserver, Application Server, Database server. Each kind of server may exist several times to achieve load balancing and fault tolerance. To roll out a new version of an application the whole set of servers that support this application needs to be deployed in one step. If something goes wrong the admin wants to roll back to the previous state.

For testing the whole environment will exist in several stages. In each stage the number of servers per role may be different. For example the system test environment may not be fault tolerant but the pre production may be.

The admins need to have a good overview over each environment. What applications are deployed there in what versions. What does each server host at the moment. The admin will not want to visit each server for the deployment so remote management is important. The deployment should be defined in a plan that can be executed against the pre production and on success be executed against the production environment.

High Performance computing

Please add more

Existing solutions