ID | IEP-6 |
Author | Anton Vinogradov |
Sponsor | Anton Vinogradov |
Created | Nov 10, 2017 |
Status | DRAFT |
To perform more effective grid monitoring some new JMX metrics needs to be implemented.
For example, now there are no metrics to control cluster topology and partitions allocation.
Monitoring of these additional JMX metrics can be useful:
Topology
- Current topology version
- Total server nodes count
- Total client nodes count
- Method to count nodes filtered by some node attribute
- Method to count nodes grouped by some node attribute
Communication SPI
- Received messages count grouped by message type
- Received messages count grouped by sender node
- Sent messages count grouped by message type
- Sent messages count grouped by receiver node
Partitions allocation (for cache groups)
- Total primary partitions count located on the current node
- Total backup partitions count located on the current node
- Min/max partition backups left in the cluster for cache group
- Maybe some methods to show partitions map/partition distribution statistics in the cluster
Jobs execution
- Total jobs execution time (now job execution statistics gathered since node started and can't be used to calculate average execution time between probes, implementation of this metric can solve this problem)
Cache
- Topology validation status for cache
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to various reference documents, if applicable.