Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. User with control.{sh|bat} script or via other APIs requests creating MR for defragmenting all native persistence on the node or particular caches. MR is created and saved on disk.
  2. User restarts the node, it enters Maintenance Mode finds MR about defragmentation and starts working on the task.
  3. When defragmentation is done, MR is automatically deleted. On next restart node with defragmented PDS enters normal operations.

Maintenance Action and Maintenance Workflow

Although MM supposes manual user intervention to fix the reason for maintenance, it can also be true that component requested MM knows how to fix the issues and can execute necessary actions automatically.
The only thing it may need is user command to execute these actions.

In case of PDS defragmentation also covered by MM all actions are executed automatically from the very beginning.

To cover both cases additional entity is suggested: MaintenanceAction. It is just an interface that could be called by Maintenance component when user requests its execution or when Maintenance component decides it is time to start automatic actions.

Workflow with MaintenanceAction may look like this:

  1. Maintenance Registry starts among first and reads from disk information about MaintenanceRecords registered earlier.
  2. Other components start after Maintenance and check MM if they should function differently in this mode. If node in MM they register special callback within Maintenance Registry that provides Maintenance Actions to the registry.
  3. After all components are started Maintenance Registry prepares maintenance: checks if user has already fixed issues manually during shut down, prints information about this to log and modifies/deletes Maintenance Records if needed.
  4. When Maintenance is prepared and there are still unresolved MaintenanceRecords Maintenance component starts automatic actions like PDS defragmentation or exposes list of user-triggered actions through CLI/JMX APIs and waits for user commands.

Risks and Assumptions

  1. It is assumed that no major changes are needed in CommandHandler to enable connecting to a particular node (the one in MM).

...