ID | IEP-14 |
Author | Anton Vinogradov |
Sponsor | |
Created | Feb 20 2018 |
Status | DRAFT |
Apache Ignite should have some general approach to handle critical failures.
The following failures should be treated as critical:
OutOfMemoryError
);User should have an ability to define node behavior in case of this failures.
System critical error - error which leads to the system's inoperability.
The following system critical errors should be handled with proposed approach:
IOException
's threw by read/write operations on file system. The following subsystems should be considered as critical:IgniteOutOfMemoryException
OutOfMemoryError
(we should have some memory reserved for this case at node startup to increase chances to handle OOM)AssertionError
(we should handle assertions as failures in case -ea flag set) (should be covered at Throwable catch for every system worker as well).The following system workers are critical and ignite node will be inoperative in case of termination one of this worker:
disco-event-worker
tcp-disco-sock-reader
tcp-disco-srvr
tcp-disco-msg-worker
tcp-comm-worker
grid-nio-worker-tcp-comm
exchange-worker
sys-stripe
grid-timeout-worker
db-checkpoint-thread
wal-file-archiver
wal-write-worker
ttl-cleanup-worker
nio-acceptor
should be extended with methods:IgniteConfiguration
public IgniteConfiguration setFailureHandler(FailureHandler hnd); public FailureHandler getFailureHandler();
Where:
interface FailureHandler { FailureAction onFailure(FailureContext failureCtx); } class FailureContext { FailureType type; Throwable error; } enum FailureAction { RESTART_JVM, STOP_NODE, NO_OP; } enum FailureType { SEGMENTATION, SYSTEM_WORKER_TERMINATION, CRITICAL_ERROR }
FailureHandler
implementation will be able to handle (see FailureAction
) each registered failure (see FailureContext
).
DeafultFailureHandler
must be initialized by default unless user provide specific implementation. DefaultFailureHandler
must return STOP_NODE
action for SEGMENTATION
failure type and TERMINATE_JVM
for the rest failures. User can use inheritance or composition in order to use default failure handler behavior.
FailureProcessor
is responsible for different failure action processing accordingly to the value returned by FailureHandler
implementation.
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to various reference documents, if applicable.