You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 3
Next »
Motivation
Apache Ignite should have some general engine to handle critical failures.
Description
List of failures should be covered by this engine:
- Critical Errors
- Critical system workers crashes
- Segmentation
List of system workers should be covered by this engine:
- disco-event-worker
- tcp-disco-sock-reader
- tcp-disco-srvr
- tcp-disco-msg-worker
- tcp-comm-worker
- grid-nio-worker-tcp-comm
- exchange-worker
- sys-stripe
- grid-timeout-worker
- db-checkpoint-thread
- wal-file-archiver
- ttl-cleanup-worker
- nio-acceptor
List of errors to be handled
- Persistence errors
- IOOM errors (part of persistence errors?)
- IO errors (list to be provided)
- OOM (we should have some memory reserved for this case at node startup to increase chances to handle OOM)
- Assertion errors (we should handle assertions as failures in case -ea flag set) (should be covered at Throwable catch for every system worker as well)
Risks and Assumptions
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/Internal-problems-requiring-graceful-node-shutdown-reboot-etc-td24856.html
Reference Links
// Links to various reference documents, if applicable.
Tickets
key |
summary |
type |
created |
updated |
due |
assignee |
reporter |
priority |
status |
resolution |
JQL and issue key arguments for this macro require at least one Jira application link to be configured
|