...
Bean Category | Bean Name | Property | Description | Metric Name |
Memory | java.lang:type=Memory | NonHeapMemoryUsage - used | hadoop.memory.nonheapmemoryusage.used | |
HeapMemoryUsage - used | hadoop.memory.heapmemoryusage.used | |||
Java Direct Memory | java.nio:type=BufferPool,name=direct | MemoryUsed | Java Direct Memory Used | hadoop.bufferpool.direct.memoryused |
JVM Metrics | Hadoop:service=HBase,name=JvmMetrics | GcCount | hadoop.hbase.jvm.gccount | |
GcTimeMillis | hadoop.hbase.jvm.gctimemillis | |||
IPC | Hadoop:service=HBase,name=RegionServer,sub=IPC | queueSize | hadoop.hbase.regionserver.ipc.queuesize | |
NumCallsInGeneralQueue | hadoop.hbase.regionserver.ipc.numcallsingeneralqueue | |||
NumActiveHandler | hadoop.hbase.regionserver.ipc.numactivehandler | |||
QueueCallTime_99th_percentile | IPC Queue Time (99th) | hadoop.hbase.regionserver.ipc.queuecalltime_99th_percentile | ||
ProcessCallTime_99th_percentile | IPC Process Time (99th) | hadoop.hbase.regionserver.ipc.processcalltime_99th_percentile | ||
QueueCallTime_num_ops | hadoop.hbase.regionserver.ipc.queuecalltime_num_ops | |||
ProcessCallTime_num_ops | hadoop.hbase.regionserver.ipc.processcalltime_num_ops | |||
Regions | Hadoop:service=HBase,name=RegionServer,sub=Server | regionCount | hadoop.hbase.regionserver.server.regioncount | |
storeCount | hadoop.hbase.regionserver.server.storecount | |||
memStoreSize | hadoop.hbase.regionserver.server.memstoresize | |||
storeFileSize | hadoop.hbase.regionserver.server.storefilesize | |||
totalRequestCount | hadoop.hbase.regionserver.server.totalrequestcount | |||
ReadRequestCount | hadoop.hbase.regionserver.server.readrequestcount | |||
WriteRequestCount | hadoop.hbase.regionserver.server.writerequestcount | |||
splitQueueLength | hadoop.hbase.regionserver.server.splitqueuelength | |||
compactionQueueLength | hadoop.hbase.regionserver.server.compactionqueuelength | |||
flushQueueLength | hadoop.hbase.regionserver.server.flushqueuelength | |||
blockCacheSize | hadoop.hbase.regionserver.server.blockcachesize | |||
blockCacheHitCount | hadoop.hbase.regionserver.server.blockcachehitcount | |||
blockCacheCountHitPercent | hadoop.hbase.regionserver.server.blockcounthitpercentblockcachecounthitpercent |
Data Retention
Metrics should be collected at least 1 minute interval (Hadoop emits the metrics at 10 secs interval). Aggregate to 5 minute level for data older than 30 days and keep half year.
Monitoring Dashboard & Alerting
Metrics Dashboard Overview
Dashboard Chart
Generally, we will follow the UI layout in Ambari, within that, the service health check application will also be included in service status and summary information.
Metrics Query Pattern:
- Flexibly change the time range from 1 hour to