Page History

...

The goal of this would be to do basic performance analysis and correctness testing in a distributed environment.

Required Metrics

Client Side Measurements

Throughput
Response Time/Latency
Timeouts
Consumer lag

Common Stats

Vmstat - Context Switch, User CPU utilization %, System CPU utilization %, Total CPU utilization %
Iostat - Reads/sec, Writes/sec, KiloBytes read/sec, KiloBytes write/sec, Average number of transactions waiting, Average number of active transactions, Average response time of transactions, Percent of time waiting for service, Percent of time disk is busy
Prstat - Virtual memory size of each java process, RSS size of each process, Total CPU utilization of each process

GC Log Analysis

Footprint (Maximal amount of memory allocated)
Freed Memory (Total amount of memory that has been freed)
Freed Memory/min (Amount of memory that has been freed per minute)
Total Time (Time data was collected for)
Acc Pauses (Sum of all pauses due to GC)
Throughput (Time percentage the application was NOT busy with GC)
Full GC Performance (Performance of full GC collections. Full GC collections are marked so in the gc logs.)
GC Performance (Performance of minor collections. These are collections that are not full according to the definition above.)
CMS counts and frequency (Number of CMS collections and their frequency)
CMS failure count and frequency (CMS failure metrics)

Server side metrics

Throughput and response time breakdown for each request at the LogManager, RequestPurgatory level
ISR membership churn aggregate and per partition
Number of expirations in the request purgatory
Leader election rate aggregate and per partition
Leader election latency aggregate and per partition
High watermark change aggregate and per partition
Replica lag time and bytes aggregate and per partition
Replica fetch throughput and response time aggregate and breakdown at the LogManager, RequestPurgatory level

Log analysis

Exceptions in logs, their frequency and types of exception
Warnings in logs, their frequency and types of warnings

Miscellaneous

Capture all the server machine profiles before tests are being executed (Such as disk space, number of CPUS etc)
Capture all configurations for each run

Phase I: Perf Tools

The goal of this phase is just to create tools to help run perf tests. We already have some of these so this will primarily just be about expanding and augmenting these.

...

Space shortcuts

Child pages

Versions Compared

Old Version 2

New Version 3

Key