...
ID | IEP-14 | ||||||||
Author | |||||||||
Sponsor | DmitryAndrey Gura | ||||||||
Created | Feb 20 2018 | ||||||||
Status |
|
Table of Contents |
---|
...
disco-event-worker
tcp-disco-sock-reader
tcp-disco-srvr
tcp-disco-msg-worker
tcp-comm-worker
grid-nio-worker-tcp-comm
exchange-worker
sys-stripe
grid-timeout-worker
db-checkpoint-thread
wal-file-archiver
wal-write-worker
wal-file-decompressor
ttl-cleanup-worker
nio-acceptor
...
Code Block | ||
---|---|---|
| ||
interface FailureHandler { FailureActionboolean onFailure(Ignite ignite, FailureContext failureCtx); } class FailureContext { FailureType type; Throwable error; } enum FailureActionFailureType { RESTART_JVMSEGMENTATION, // JVM process must be started from ignite.(sh|bat) script. All nodes in the process will be restarted. TERMINATE_JVM, // JVM process will be terminated. All nodes in the process will be stopped. STOP_NODE, // This particular node will be stopped. Process will terminated if exactly one node started from this process. NO_OP; // Nothing to do. WARNING: Node behavior will be undefined. Especially in case of system worker termination. } enum FailureType { SEGMENTATION, SYSTEM_WORKER_TERMINATION, CRITICAL_ERROR } |
FailureHandler
implementation will be able to handle (see FailureAction
) each registered failure (see FailureContext
).
DeafultFailureHandler
must be initialized by default unless user provide specific implementation. DefaultFailureHandler
must return STOP_NODE
action for any failure type. User can use inheritance or composition in order to use default failure handler behavior.
FailureProcessor
is responsible for different failure action processing accordingly to the value returned by FailureHandler
implementation.
It's possible that node won't be stopped correctly in case of FailureAction.STOP_NODE
due to some bugs and it can lead to process hanging. This bugs should be discovered ad fixed in the future.
...
SYSTEM_WORKER_TERMINATION,
CRITICAL_ERROR
} |
FailureHandler
implementation will be able to handle Ignite critical failures accordingly to strategy provided by user.
The following implementations should be provided out of the box:
NoOpFailureHandler
- Just ignores any failure. It's useful for tests and debugging.RestartProcessFailureHandler -
Specific implementation that could be used only with ignite.(sh|bat). Process must be terminated using Ignition.restart(true)
call.StopNodeFailureHandler
- This implementation will stop Ignite node in case of critical error using Ignition.stop(true)
or Ignition.stop(nodeName, true)
call.StopNodeOrHaltFailureHandler(boolean tryStop, long timeout)
- This implementation will try to stop node if tryStop
value is true
. If node can't be stopped during provided timeout
or tryStop
value is false
then JVM process will be terminated forcibly ( Runtime.halt()
).Default failure handler is StopNodeOrHaltFailureProcessor
where tryStop
value is false
.
Critical system worker must catch all exceptions ( Throwable
and derived classes) in high-level try-catch
block and take into account that thread could be terminated due to an programmatic mistake that leads to unintentional worker termination. So basic template should looks like the following code snippet:
Code Block | ||
---|---|---|
| ||
@Override
public void run() {
Throwable err = null;
try {
// Critical worker's code.
}
catch(Throwable e) {
err = e;
}
finally {
// Call failure handler.
FailureContext failureCtx = new FaulureCtx(FailureType.SYSTEM_WORKER_TERMINATION, err);
ctx.failure().process(failureCtx); // Handle failure. Where ctx - kernal context.
}
} |
Example of using FailureHandler
in IgniteConfiguration
via Spring XML:
Code Block | ||
---|---|---|
| ||
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="failureHandler">
<bean class="org.apache.ignite.failure.StopNodeFailureHandler"/>
</property>
</bean> |
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|