ID | IEP-14 |
Author | Anton Vinogradov |
Sponsor | |
Created | Feb 20 2018 |
Status | DRAFT |
Apache Ignite should have some general approach to handle critical failures.
List of failures should be covered by this engine:
List of system workers should be covered by this engine:
disco-event-worker
tcp-disco-sock-reader
tcp-disco-srvr
tcp-disco-msg-worker
tcp-comm-worker
grid-nio-worker-tcp-comm
exchange-worker
sys-stripe
grid-timeout-worker
db-checkpoint-thread
wal-file-archiver
wal-write-worker
ttl-cleanup-worker
nio-acceptor
List of errors to be handled
IgniteConfiguration
have to be extended with methods
public IgniteConfiguration setFailureHandler(FailureHandler hnd); public FailureHandler getFailureHandler();
Where
interface FailureHandler { FailureAction onFailure(FailureContext failureCtx); } class FailureContext { FailureType type; Throwable cause; } enum FailureAction { RESTART_JVM, STOP, NOOP; } enum FailureType { SEGMENTATION, SYSTEM_WORKER_CRASHED, CRITICAL_ERROR }
So, provided by user subclass of FailureHandler
able to decide what to do (see FailureAction
) on each registered failure (see FailureContext
).
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to various reference documents, if applicable.