Page History

...

Node failure detection speed-up

Adjustable timeouts

Some Already found that some constants used at failure detection are hardcoded and large.

Simplification

The Also, code responsible for this feature performs a lot of re-checks and re-waits and you may have detection time close to failureDetectionTimeout x2 or even x3.

Hunt for the Zombies

GC Another problem is GC, and it may increase failure detection dramatically, so, watchdog started .

While node in STW it can't let cluster know it exceeds possible STW duration.

Also, the node may start operating after the STW exceeding, this may cause additional performance degradation.

A good case is to detect GC time locally (from another JVM or from native code can help here) and kill the node in case of exceeding.

Discovery messaging speed-up

...

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Node failure detection speed-up

Adjustable timeouts

Simplification

Hunt for the Zombies

Discovery messaging speed-up