Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagejava
interface FailureHandler {
   FailureActionvoid onFailure(FailureContext failureCtx);
}

class FailureContext {
   FailureType type;
   Throwable error;
}

enum FailureAction {
   RESTART_JVM, // JVM process must be started from ignite.(sh|bat) script. All nodes in the process will be restarted.
   TERMINATE_JVM, // JVM process will be terminated. All nodes in the process will be stopped.
   STOP_NODE, // This particular node will be stopped. Process will terminated if exactly one node started from this process.
   NO_OP; // Nothing to do. WARNING: Node behavior will be undefined. Especially in case of system worker termination.
}

enum FailureType {
   SEGMENTATION,
   SYSTEM_WORKER_TERMINATION,
   CRITICAL_ERROR
}

FailureHandler implementation will be able to handle (see FailureAction) each registered failure (see FailureContext).

DeafultFailureHandler must be initialized by default unless user provide specific implementation. DefaultFailureHandler must return STOP_NODE action for any failure type. User can use inheritance or composition in order to use default failure handler behavior.

Ignite critical failures accordingly to strategy provided by user.

The following implementations should be provided out of the box:

  • NoOpFailureHandler - Just ignores any failure. It's useful for tests and debugging.
  • RestartProcessFailureHandler - Specific implementation that could be used only with ignite.(sh|bat). Process must be terminated using Ignition.restart(true) call.
  • StopNodeFailureHandler - This implementation will stop Ignite node in case of critical error using Ignition.stop(true) or Ignition.stop(nodeName, true) call.
  • StopNodeOrHaltFailureHandler(boolean tryStop, long timeout) - This implementation will try to stop node if tryStop value is true. If node can't be stopped during provided timeout or tryStop value is false then JVM process will be terminated forcibly ( Runtime.halt() ).

Default failure handler is StopNodeOrHaltFailureProcessor where tryStop value is falseFailureProcessor is responsible for different failure action processing accordingly to the value returned by FailureHandler implementation.

Risks and Assumptions

It's possible that node won't be stopped correctly in case of FailureAction.STOP_NODE due to some bugs and it can lead to process hanging. This bugs should be discovered ad fixed in the future.

Discussion Links

...

  1. Internal

...

  1. problems

...

  1. requiring

...

  1. graceful

...

  1. node

...

  1. shutdown

...

  1. , reboot

...

  1. , etc

...

  1. .
  2. IEP-14: Ignite failures handling (Discussion)

Reference Links

Apache Ignite documentation: Ignite life cycle

...