Chapter 8: Analyzing Results and Bottleneck Identification
Errors in performance tests are like warning lights on a car dashboard. A single "check engine" light does not tell you much -- is it a loose gas cap or a failing transmission? You need to categorize errors to diagnose the problem. Most beginners see "2% error rate" and stop there. A senior QA engineer asks: what kind of errors? When did they start? Did they correlate with a load increase? Did they affect all endpoints or just one?
| Error Type | HTTP Code | What It Usually Means Under Load | Investigation Path |
|---|---|---|---|
| Connection Refused | N/A (no response) | Server cannot accept more connections | Check max connections config, OS file descriptor limits |
| Connection Timeout | N/A (timeout) | Server too busy to respond in time | Check thread pool size, request queue depth |
| 500 Internal Server Error | 500 | Application exception -- null pointer, OOM, unhandled edge case | Check application logs for stack traces at the error timestamp |
| 502 Bad Gateway | 502 | Reverse proxy (nginx/ALB) cannot reach the app server | App server crashed or is too slow; check if process is alive |
| 503 Service Unavailable | 503 | Server is intentionally rejecting -- overloaded or in maintenance | Check rate limiters, circuit breakers, health checks |
| 504 Gateway Timeout | 504 | Reverse proxy timed out waiting for app server | App is alive but too slow; check downstream dependencies (DB, APIs) |
| 429 Too Many Requests | 429 | Rate limiter kicked in | This might be expected behavior -- check if rate limits match your test plan |
The timing of errors is as important as their type. Errors that appear immediately suggest configuration problems. Errors that appear gradually suggest resource exhaustion. Errors that appear suddenly after a period of stability suggest a threshold being crossed.
Not all errors in your report are the server's fault. JMeter itself can run out of memory, network connections, or file handles if your load generator machine is underpowered. If you see "java.net.SocketException: Too many open files" or "java.lang.OutOfMemoryError" in the JMeter log, the bottleneck is your test machine, not the server. Always monitor your load generator machine during the test.
# These errors are from YOUR machine, not the server:
java.net.SocketException: Too many open files
→ Increase OS file descriptor limit: ulimit -n 65535
java.lang.OutOfMemoryError: Java heap space
→ Increase JMeter heap: set HEAP=-Xms2g -Xmx4g in jmeter.bat/jmeter.sh
java.net.BindException: Address already in use
→ Your machine ran out of ephemeral ports. Reduce users per machine or
increase port range: sysctl -w net.ipv4.ip_local_port_range="1024 65535"
java.net.ConnectException: Connection timed out
→ Could be client-side OR server-side. Check both. If your CPU is at
100% on the load gen machine, it is client-side.
# RULE: Always monitor your load generator machine.
# If its CPU > 80% or memory > 85%, your results are unreliable.
# Distribute the load across multiple machines instead.Note the total error count and error percentage from the Dashboard.
Open the Errors Over Time chart. Identify WHEN errors started -- correlate with the user ramp-up.
Open the response data for failed requests. Group errors by HTTP status code.
For each error type, check if it affected all endpoints or just one specific endpoint.
Cross-reference the error timestamps with server logs (application logs, access logs).
Check your load generator machine's resource usage to rule out client-side errors.
Document: "Error type [X] started at [Y] concurrent users, affecting endpoint [Z], caused by [root cause from logs]."
Q: During a load test, you see a 5% error rate with all errors being 502 Bad Gateway. The application logs show no errors. What is happening?
A: A 502 Bad Gateway means the reverse proxy (nginx, Apache, or a load balancer like ALB) successfully received the request but could not get a response from the upstream application server. The fact that application logs show no errors tells me the request never reached the application -- it died between the proxy and the app server. There are several possibilities: the app server process crashed and restarted (check process uptime and restart logs), the app server's request queue was full and it stopped accepting connections (check max worker/thread settings), or the app server was so slow that the proxy timed out before getting a response (check proxy timeout settings). I would check the reverse proxy logs (nginx error.log or ALB access logs) for the exact upstream error message, and also check if the application server was still running at the timestamp of the errors.
Key Point: Categorize errors by type (HTTP code), timing (when they started), and scope (which endpoints). A 2% error rate means nothing without context -- 2% connection timeouts on checkout during peak load is a very different problem from 2% 404s on a broken image link.
Key Point: Error analysis requires categorization by HTTP code, timing correlation with load increase, and distinguishing server errors from load generator errors.