...
This situation can occur if user explicitly markups the transaction (esp Pessimistic Repeatable Read) and, for example, calls remote service (which may be unresponsive) after acquiring some locks. All other transactions depending on the same keys will hang.
...
Also there should be a screen in Web Console that will list all ongoing transactions in the cluster including the info as above.
This situation occurs if user or Ignite comes to a Java-level deadlock due to a bug in code - reverse order synchronized(mux1) {synchronized (mux2) {}} sections, reverse order reentrant locks, etc.
...
This situation can occur if user submits tasks that recursively submit more tasks and synchronously wait for results. Jobs arrive to worker nodes and are queued forever since there are no free threads in public pool since all threads are waiting for job results.
...
Web Console should provide ability to cancel any task and job from UI.
Timed out tasks and jobs should be reported on Web Console and reported to logs. We need to introduce new config property to set timeout for reported jobs.
Log record and Web Console should include:
When Ignite node suffers from GC pauses it is literally unresponsive for every other node in topology.
Very good solution with 2 native threads is described here
Jira | ||||||
---|---|---|---|---|---|---|
|
Native threads should report GC pause to stdout and if possible to a logger instance. Of course, if policy is set to "kill the node" then output via log is not possible as native thread will stuck in safepoint and no killing and logging occur until safepoint is released.