Resource Utilization Metrics

Response time tells you the symptom. Resource utilization tells you the cause. When response times spike, is it because the CPU is at 100%? The database is out of connections? Memory is full? Without server-side metrics, you are guessing.

The Five Server Metrics

Metric	Healthy	Warning	Critical	What to Do
CPU Usage	< 60%	60-80%	> 80%	Optimize code, add servers, increase cores
Memory Usage	< 70%	70-85%	> 85%	Fix memory leaks, increase RAM, optimize caching
Disk I/O	< 60%	60-80%	> 80%	Optimize queries, add SSD, reduce logging
Network I/O	< 60%	60-80%	> 80%	CDN, compression, reduce payload size
DB Connections	< 70% of pool	70-90%	> 90%	Fix connection leaks, increase pool size

Correlating Metrics

The real power is correlating server metrics with test metrics. When response time jumps at 300 users, which server metric also jumped? If CPU hit 95% at 300 users, the CPU is the bottleneck. If CPU is fine but database connections hit 100%, the database pool is the bottleneck. Without this correlation, you are fixing the wrong thing.

Common Bottleneck Patterns

CPU at 95%, everything else normal → application code is the bottleneck. Optimize algorithms, add caching, or add CPU cores.

Memory growing steadily → memory leak. Profile the application to find the leak source. Run a soak test to confirm.

DB connections at max, CPU low → connection pool exhaustion. Connections are not being returned. Fix the code or increase pool size.

Disk I/O at 100%, CPU normal → too many disk operations. Move to SSD, reduce logging verbosity, optimize database queries.

Network saturated → responses are too large. Enable compression (gzip), use a CDN, paginate API responses.

All resources low but response time high → external dependency is slow. An API call, database query, or third-party service is the bottleneck.

Monitoring Tools

Tool	What It Monitors	Cost
Grafana + Prometheus	Server metrics, custom metrics, dashboards	Free (open source)
New Relic	APM, server metrics, distributed tracing	Paid (free tier available)
Datadog	Infrastructure, APM, logs, synthetics	Paid (free tier available)
htop / top	CPU, memory, processes (command line)	Free (built into Linux)
JMeter PerfMon	Server metrics plugin for JMeter	Free (JMeter plugin)

Always set up server monitoring BEFORE running performance tests. Knowing response times without knowing server resource usage is like knowing a patient has a fever without checking their blood work. Start with simple tools like htop on the server or JMeter PerfMon plugin, then graduate to Grafana + Prometheus for production-grade monitoring.

Q: What server-side metrics do you monitor during performance tests?

A: I monitor five key metrics: CPU usage (should stay under 70-80%), memory usage (watch for gradual growth indicating leaks), disk I/O (high I/O suggests query or logging issues), network I/O (saturated network means payload optimization needed), and database connection pool usage (near max indicates connection leaks). The key is correlating server metrics with test metrics. If response time spikes at 300 users and CPU hits 95% at the same point, CPU is the bottleneck. If all server metrics are low but response time is high, the bottleneck is an external dependency. I use Grafana + Prometheus or JMeter PerfMon plugin for monitoring.

Key Point: Server metrics (CPU, memory, disk, network, DB connections) tell you WHY performance degrades. Correlate them with response time to pinpoint the bottleneck. Always set up monitoring before running tests.

Key Point: Server metrics reveal the cause. Correlate CPU, memory, disk, network, and DB connections with response times to find the real bottleneck.

Previous Up NextSetting SLAs and Acceptance Criteria

Chapter 2: Key Metrics