Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Tomcat Grid manages one or more local or remote Tomcat instances from a centralized location. This manager application shows the status of each Tomcat instance, and provides a simple interface to trigger operations on them, individually or as a group.

...

The primary location runs a Tomcat Web Grid Manager, a web application that runs on a separate Tomcat instance. This manager application shows the state of the all Tomcat instances on the grid, and allows users to trigger operations on these Tomcat instances.

...

The secondary locations run on dedicated Tomcat instances. They run a copy of the Web Grid Manager but they are shut down all the time unless they are explicitly started by an operator (when the primary one goes bad). No two managers should be active at the same time. If activated, the secondary locations can start producing changes to the local grid configuration, and these changes are replicated to all other managers (if/when possible).

...

Wiki Markup
The Grid Agents are Java processes installed and running on each machine that  listen (on a configured port) for commands from a managerGrid Manager application (Web or  CLI) and act upon them by interacting with the local Tomcat instances. No  encryption is envisioned on the network channels, since all these machines are  considered to be installed on a secured network segment (at least in prod).  \[Maybe we should reconsider this\]

All the Tomcat instances (including the Web Grid Managers), as well as the Grid Agents, are installed manually. The primary grid manager and all the agents are started manually too.

Once the Web Grid Manager is started up, machines and instances can be registered in it, so they can become manageable. No centralized (comprehensive) provisioning is envisioned until later version versions of Tomcat Grid (see below).

The Grid Agents are also processes that can also be managed. In particular, status can be obtained (and showed) from them, and basic operations (start/stop/kill) can be triggered on them. These operation operations are, however, heavily dependent on the OS and OS capabilities (configuration, installed tools, etc.) and the infrastructure architecture (fire-walled machines, network VLans, etc).

Collectively, Tomcat instances and Grid Agents are "services" , since both can be managed.

Later versions of the grid Grid include "collection" management. This allows to group subsets of services (Tomcat instances and Grid Agents), so they can be operated as whole. A Each collection can include plain services, or other collections (recursively).

Considering all the above, the following phases could be considered for on as a base line for the road map of the Tomcat Grid development.

Phase 1 - Core Grid Operation

...

Included features are:

  1. The Web Grid Manager presents a Web interface that shows information of the whole Grid and present simple buttons to operate the Tomcat instances.
  2. The managing logic must be clearly separated from the Web interface logic, since later on, a Command-Line Grid Manager will be included, and will use the same managing logic.
  3. The available commands for each instance are:
    • status: retrieves the status of a Tomcat instance throught the corresponding Grid Agent
    • trigger-start: sends a start request to the Tomcat instance using the corresponding Grid Agent
    • trigger-stop: sends a stop request to the Tomcat instance using the corresponding Grid Agent
    • trigger-kill: sends a kill request to the Tomcat instance using the corresponding Grid Agent
  4. Wiki Markup
    A simple configuration file lists all the machines and their instances so the Grid knows where each instance resides. \[This configuration file is probably in XML format\]
    \\
  5. Grid Agents are installed on each machine and manage all instances in that machine pertaining to the Grid. Grid Agents receive commands from any manager and act accordingly. To manage the instances the Agents use:
    • Shell calls: start an instance, kill calls,an instance.
    • JMX calls to retrieve instance live information.
    • JMX calls to change instance configuration and statelive values, and to request instance shutdown.
    • OS calls for any OS related need.
  6. It's assumed that a port will be accessible on from each Grid Manager to each machine where the Grid Agent serves the managers applicationAgents are serving. The firewall, if present must allow active server-type sockets on that port.
  7. Multiple Grids (and Grid Agents) can be running on the same set (or subset) of machines. If that's the case, Tomcat instances, and Grid Agents run on different ports for each grid. When multiple grids use the same servers machines they don't interfere with each other and can be operated simultaneously.
  8. The status command shows the following information for each instance:
    • Machine
    • Service (a unique grid-wide name for each instance)
    • State
  9. The state of an instance can be:
    • Wiki Markup
      *Active*: the instance OS process exists, the instance is serving requests, and it looks healthy \[enough\].
      \\
    • Wiki Markup
      *Zoetic* \[for lack of a better word\]: the instance OS process exists, but the instance is unresponsive and it doesn't respond to requests for state. It's probably not serving any HTTP requests, does not look healthy, it may be starting, it may be shutting down, it may be overwhelmed. Who knows.
      \\
    • Stopped: the instance OS process does not exist, and therefore the instance is not operating at all.
    • If possible it would be great to discern different sub cases of the Zoetic state, so to help the user to determine what's going on and tackle the case accordingly:
      • Starting: The Tomcat instance process exists, and the instance is starting. It's not yet serving HTTP requests.
      • Stopping: The Tomcat instance process exists, and the instance is stopping. It's no longer serving HTTP requests.
      • Unresponsive: The Tomcat instance process exists, but the instance health isn't good, it's not responding to HTTP requests, or it's overwhelmed. It's not even responding to state requests for status.
  10. Grid Agents communicate over unsecured TCP sockets, and assume the network segmentation provides a secure segmentcommunication security is enforced by the network architecture (segregated segments/VLans).
  11. The "trigger"-type commands just deliver the corresponding signal to the instance's Grid Agent and returns right away, without waiting for the full operation to complete. It's kind of "fire and forget". The web user can keep on refreshing the the web interface to find out about the progress of the status of the Tomcat instances.
  12. Wiki Markup
    Simple user name/password authentication is implemented to secure the Web interface. \[Maybe we'll need to provide more options\]
    \\

...

  1. As well as the instances, the agents can become unresponsive, or even crash. To address cases like these commands are implemented to manage the agents Grid Agents as well. The grid agents Grid Agents become now manageable services.
  2. All grid agents are now also registered in the configuration file under (also) a unique service names. Grid agents names share the same namespace than Tomcat instance names. All manageable services have unique names so commands (: i.e. the Tomcat instances & Grid Agent names are grid-wide unique. This way commands (such as a trigger-start , for example) can distinguish which type of service it needs to act on, and will chose a different logic (program, or script) to execute.
  3. All four previously defined commands are now available for the Grid Agents:
    • status
    • trigger-start
    • trigger-stop
    • trigger-kill
  4. The status command now adds an extra column "Type" after the service column that indicates the type of service: Tomcat instance, or Grid Agent.
  5. The mechanism to manage the grid agents is necessarily OS dependent. For example, in Linux it can be implemented using Bash commands though SSH. Most suitable Suitable mechanisms must be studied for each OS.

...

  1. Secondary managers are registered on the grid's configuration file.
  2. Every time a configuration change is produced or detected on the configuration of the primary manager, the changes are distributed to all secondary managers.
  3. If the primary manager is down, secondary managers can distribute can be started, and can start producing changes (based on the local configuration copy). They can also start distributing the new configuration changes.

...