Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This phase revamps the Grid Agents so they become manageable.

Included features are:#

  1. As well as the instances, the agents can become unresponsive, or even crash. To address cases like these commands are implemented to manage the agents as well. The grid agents become now manageable services.

...

  1. All grid agents are now also registered in the configuration file under (also) unique service names. All manageable services have unique names so commands (a trigger-start, for example) can distinguish which type of service it needs to act on, and will chose a different logic (program, or script) to execute.

...

  1. All four previously defined commands are now available for the Grid Agents:

...

    • status

...

    • trigger-start

...

    • trigger-stop

...

    • trigger-kill

...

  1. The status command now adds an extra column "Type" after the service column that indicates the type of service: Tomcat instance, or Grid Agent.

...

  1. The mechanism to manage the grid agents is necessarily OS dependent. For example, in Linux it can be implemented using Bash commands though SSH. Most suitable mechanisms must be studied for each OS.

Phase 3. Secondary Managers

The functionality for configuration replication is added.

Included features are:#

  1. Secondary managers are registered on the grid's configuration file.

...

  1. Every time a configuration change is produced or detected on the configuration of the primary manager, the changes are distributed to all secondary managers.

...

  1. If the primary manager is down, secondary managers can distribute new configuration changes.

Phase 4 - Extended Service & Machine Information

This phase extends the status information of the whole grid beyond the basic data.

Included features are:#

  1. The status command now adds more information for each service (Tomcat instances and Grid Agents):

...

    • CPU usage (if possible)

...

    • CPU load (if possible)

...

    • Head usage (if possible)

...

    • Threads (if possible)

...

    • Started on (if possible)

...

    • Any other information deemed useful for managing purposes.
  1. Wiki Markup
    \

...

  1. [Optional\] Machine information (same page, or maybe an extra tab) shows per machine:

...

    • CPU

...

    • usage

...

    • CPU

...

    • load (1 min, 5, min, 15 min)
    • Memory usage
    • File system drives/mounts

...

    • space

...

    • usage

...

    • (maybe

...

    • only

...

    • specific

...

    • mounts)

Phase 5 - Command-Line Interface (CLI)

This phase provides a CLI manager interface for environments that cannot use the web interface.

Included features are:#

  1. In addition to the Web Grid Manager interface, the Command-Line Grid Manager interface is suitable when the web interface cannot be used. Typical cases are, when no web port is available on the servers (probably fire-walled), when the security policies do not allow remote server operations, etc. This maybe the case on some secured/fire-walled production environments where only text sessions are accessible.

...

  1. The Command-Line Grid Manager is also suitable for automation (e. g. the weekly full/partial site restart) when unattended operations are scheduled, using cron or equivalent utilities.

...

  1. The Command-Line Grid Manager always leaves a log file per command execution on a directory created for this purpose. Each log file's name includes the time stamp, the command name, and (if possible) the arguments.

...

  1. The implemented commands are:

...

    • status

...

    • trigger-start

...

    • trigger-stop

...

    • trigger-kill

...

  1. The trigger commands are only executed when necessary. If an instance is already running a trigger-start command will be ignored. Conversely trigger-stop and trigger-kill commands are ignored when the instance is stopped.

...

  1. Return codes must be strategically defined to allow automation. Well defined return codes can provide useful information to the caller program/process, so it can clearly identify the problem and act accordingly.

Phase 6 - Hooks

Extending the core operation of the grid services with custom logic.

Included features are:#

  1. Hooks are extra activities we want to be performed when some events occur on each instance. A hook is implemented as a shell scripts (or other) and is linked to one of the following events:

...

    • pre-trigger-start

...

    • post-trigger-start

...

    • pre-trigger-stop

...

    • post-trigger-stop

...

    • pre-trigger-kill

...

    • post-trigger-kill

...

  1. The hooks are only executed when the corresponding signal is not ignored. For example, if a trigger-start is issued and the instance is stopped, the corresponding pre-trigger-start and post-trigger-start hooks are executed. If the instance was running, then the command would be ignored and its hooks would also be skipped.

...

  1. Hooks can be useful for many purposes. For example, typical uses are:

...

    • Prepare an instance configuration.

...

    • Record instance events.

...

    • Send emails or other notifications upon restarts.

...

    • Clear caches & temp dirs before starting an instance.

...

    • Delay the start of an instance to allow the OS to reclaim resources.

...

    • Generate thread dump on specific events.

...

  1. Hooks scripts run on the machine where the affected instance runs. Therefore, the hooks script are copied and are ready to be executed on all machines of the grid.

...

  1. When hooks are registered (maybe uploaded) on the Grid they are automatically distributed behind the scenes to all instances/machines before they are ready to use.

Phase 7 - Enhanced Grid Operation

Beyond the basic trigger operations, there's usually need for more complex ones, that provide very common needs but are seldom formally implemented.

Included features are:#

  1. Non-trigger commands are added to both the Command-Line and Web interfaces:

...

    • start: waits until the operation succeeds or fail

...

    • stop: waits until the operation succeeds or fail

...

    • kill: waits until the operation succeeds or fail

...

    • restart: waits until the operation succeeds or fail; with configurable restart delay

...

    • killstart: waits until the operation succeeds or fail; with configurable restart delay

...

  1. The new commands operate on both types of services (instances and agents).

...

  1. New hooks are added for the new commands:

...

    • pre-start

...

    • post-start

...

    • pre-stop

...

    • post-stop

...

    • pre-kill

...

    • post-kill

...

    • pre-restart

...

    • post-restart

...

    • pre-killstart

...

    • post-killstart

...

  1. All these new commands use the trigger commands behind the scenes.

...

  1. The hooks for the non-trigger events are never ignored, so they are executed even if the related trigger commands are ignored.

...

  1. Automatic trigger-kill operations are now be automatically issued for stop and restart operations if configured, when a trigger-stop fails to succeed in the configured limit of time. The time limit is now optionally specified in the configuration file on a per-service basis.

...

  1. A restart delay (now optionally specified on a per-service bases on the configuration file) is used when restarting services: it applied to the restart and killstart commands.

...

  1. The non-trigger commands show an update of the service state periodically (defaults to every 10s, and can be specified on the configuration file), and they keep working until the full operation completes.

Phase 8 - Simple Deployment

...