Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Node failure detection speed-up

Adjustable timeouts

Some Already found that some constants used at failure detection are hardcoded and large.

Simplification

The Also, code responsible for this feature performs a lot of re-checks and re-waits and you may have detection time close to failureDetectionTimeout x2 or even x3.

Hunt for the Zombies

GC Another problem is GC, and it may increase failure detection dramatically, so, watchdog started .

While node in STW it can't let cluster know it exceeds possible STW duration.

Also, the node may start operating after the STW exceeding, this may cause additional performance degradation.

A good case is to detect GC time locally (from another JVM or from native code can help here) and kill the node in case of exceeding.

Discovery messaging speed-up

...