Chapter 10: Practice: Load Test a Web App
You have run three tests and generated a mountain of data. Now comes the part that separates a button-clicker from a performance engineer: analysis. Anyone can run JMeter. The value you bring is your ability to look at the numbers and say "the checkout endpoint has a database query that does a full table scan -- adding an index on the order_date column would reduce response time by 60%." That is what gets you hired.
JMeter generates a comprehensive HTML report with multiple sections. Here is what each section tells you and how to read it:
| Report Section | What It Shows | What to Look For |
|---|---|---|
| Dashboard / Statistics | Summary table with response times, error rates, throughput per request | Sort by p99 or error rate to find worst performers |
| Response Times Over Time | Line graph of response times throughout the test | Upward trends indicate resource exhaustion -- flat is good |
| Response Times Percentiles | Distribution curve of response times | Gap between p95 and p99 -- large gap means outlier spikes |
| Active Threads Over Time | How many virtual users were active at each moment | Should follow your ramp-up pattern -- plateaus mean threads are stuck |
| Throughput Over Time (Hits/s) | Requests processed per second over time | Should increase with users then plateau -- drops indicate saturation |
| Response Codes Per Second | HTTP status codes over time | Any non-200 codes appearing -- when do errors start? |
| Errors | Breakdown of error types and which requests failed | Which endpoints fail first and what error messages appear |
When you see poor performance, the bottleneck is always in one of four places. Think of them as the four suspects in a performance crime scene:
| Pattern in JMeter Results | Likely Bottleneck | Evidence to Look For | Common Fix |
|---|---|---|---|
| Response times increase linearly with users | CPU saturation | Server CPU at 90%+ during test | Optimize code, add caching, scale horizontally |
| Response times sudden spike at X users | Connection pool exhaustion | Max pool size = X, waiting for connections | Increase pool size, optimize query duration |
| One endpoint slow, others fine | Slow database query | Check slow query log at that endpoint | Add database index, optimize query, add caching |
| Errors start at specific user count | Thread pool / worker exhaustion | Max threads reached, requests queued | Increase thread pool, reduce request processing time |
| Response times increase over time (not with users) | Memory leak | Heap usage grows, GC pauses increase | Fix the leak, increase heap (temporary) |
| All endpoints slow equally | Network bandwidth or load balancer | Network utilization near capacity | Increase bandwidth, optimize payload sizes |
| Checkout specifically slow | Database transaction locks | Lock wait timeouts in DB logs | Optimize transaction scope, reduce lock duration |
Here is how you present the combined results from all three test runs. This is the table your stakeholders actually care about:
Performance Test Summary -- Shopping Portal
=============================================
| Baseline (10u) | Load (100u) | Stress (250u) | SLA Target
--------------------+----------------+-------------+---------------+-----------
Avg Response Time | 340ms | 980ms | 3,200ms | --
p95 Response Time | 520ms | 1,800ms | 9,100ms | < 3,000ms
p99 Response Time | 680ms | 2,400ms | 22,100ms | < 5,000ms
Error Rate | 0.0% | 0.3% | 4.6% | < 1.0%
Throughput (RPS) | 15 | 42 | 74 (peak 89) | > 30
Breaking Point | -- | -- | ~180 users | --
Verdict: CONDITIONAL PASS
- System meets SLAs at 100 concurrent users (expected peak)
- System breaks down at ~180 users (1.8x expected peak)
- Recommendation: fix Search and Checkout bottlenecks before Diwali sale
to increase capacity headroom to at least 2.5x expected peakWhen presenting results, always lead with the business impact. Do not say "p99 is 22 seconds." Say "at 250 users, 1 in 100 customers will wait over 22 seconds to complete checkout, and 12% of checkout attempts will fail entirely. At our expected conversion rate, that translates to approximately X lost orders per hour."
Q: How do you identify bottlenecks from performance test results?
A: I use a systematic approach. First, I look at the Aggregate Report to identify which specific transactions have the highest response times or error rates -- this narrows the investigation to specific endpoints. Second, I examine the Response Times Over Time graph to determine whether degradation is gradual (resource exhaustion) or sudden (hitting a limit like connection pool size). Third, I compare baseline vs load test metrics -- a 400%+ increase in response time for a specific endpoint while others show 150-200% suggests that endpoint has a unique bottleneck. Fourth, I correlate JMeter results with server-side monitoring (CPU, memory, disk I/O, database metrics). For example, if Search response time spikes and the database server shows high CPU at the same time, I check the slow query log for the search query. The bottleneck is usually in one of four categories: CPU, memory, disk I/O, or network/database, and the server metrics tell me which one.
Key Point: Analysis is where you earn your keep. Use the HTML report to identify which endpoints degrade, the response time graphs to understand when degradation occurs, and server monitoring to pinpoint whether it is CPU, memory, database, or network.
Key Point: Finding bottlenecks requires comparing runs, identifying degradation patterns in the graphs, and correlating JMeter results with server-side metrics. Always translate findings into business impact.