Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

// Define the problem to be solved.

Description

Internal problems may cause unexpected cluster behaviour.
We should determine behavior in case any of internal problem happened.

Description

Internal problems can be split to

1) OOM or any other reason cause node crash

2) Situations required graceful node shutdown with custom notification (covered now by IEP-14 Ignite failures handling)
- IgniteOutOfMemoryException
- Persistence errors
- ExchangeWorker exits with error

3) Prefomance issues should be covered by metrics
- GC STW duration
- Timed out tasks and jobs
- TX deadlock
- Hanged Tx (waits for some service)
- Java Deadlocks

4) Situations required external monitoring implementation
- GC STW duration exceed maximum possible length (node should be stopped before STW finished)// Provide the design of the solution.

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

...