Error Analysis and Categorization

Errors in performance tests are like warning lights on a car dashboard. A single "check engine" light does not tell you much -- is it a loose gas cap or a failing transmission? You need to categorize errors to diagnose the problem. Most beginners see "2% error rate" and stop there. A senior QA engineer asks: what kind of errors? When did they start? Did they correlate with a load increase? Did they affect all endpoints or just one?

HTTP Error Categories Under Load

Error Type	HTTP Code	What It Usually Means Under Load	Investigation Path
Connection Refused	N/A (no response)	Server cannot accept more connections	Check max connections config, OS file descriptor limits
Connection Timeout	N/A (timeout)	Server too busy to respond in time	Check thread pool size, request queue depth
500 Internal Server Error	500	Application exception -- null pointer, OOM, unhandled edge case	Check application logs for stack traces at the error timestamp
502 Bad Gateway	502	Reverse proxy (nginx/ALB) cannot reach the app server	App server crashed or is too slow; check if process is alive
503 Service Unavailable	503	Server is intentionally rejecting -- overloaded or in maintenance	Check rate limiters, circuit breakers, health checks
504 Gateway Timeout	504	Reverse proxy timed out waiting for app server	App is alive but too slow; check downstream dependencies (DB, APIs)
429 Too Many Requests	429	Rate limiter kicked in	This might be expected behavior -- check if rate limits match your test plan

The Error Timeline -- When Errors Appear Matters

The timing of errors is as important as their type. Errors that appear immediately suggest configuration problems. Errors that appear gradually suggest resource exhaustion. Errors that appear suddenly after a period of stability suggest a threshold being crossed.

Errors from the start -- Misconfiguration. Wrong URL, bad auth token, server not running. Not a performance problem; fix and re-run.
Errors appearing gradually as load increases -- Server is reaching capacity. The system is struggling but not dead. This is the useful data point -- it tells you the capacity boundary.
Sudden burst of errors at a specific timestamp -- Resource exhaustion event. A connection pool ran dry, memory filled up, or a downstream service stopped responding.
Errors appearing only after extended runtime (constant load) -- Resource leak. Memory leak, connection leak, file handle leak. The system works at first but degrades over time.
Intermittent errors throughout the test -- Flaky behavior. Could be load-balancer issues, DNS resolution problems, or non-deterministic application bugs that surface under concurrency.

Differentiating Tool Errors from Real Errors

Not all errors in your report are the server's fault. JMeter itself can run out of memory, network connections, or file handles if your load generator machine is underpowered. If you see "java.net.SocketException: Too many open files" or "java.lang.OutOfMemoryError" in the JMeter log, the bottleneck is your test machine, not the server. Always monitor your load generator machine during the test.

Common Load Generator Errors (Not Server Errors)text

# These errors are from YOUR machine, not the server:

java.net.SocketException: Too many open files
→ Increase OS file descriptor limit: ulimit -n 65535

java.lang.OutOfMemoryError: Java heap space
→ Increase JMeter heap: set HEAP=-Xms2g -Xmx4g in jmeter.bat/jmeter.sh

java.net.BindException: Address already in use
→ Your machine ran out of ephemeral ports. Reduce users per machine or
  increase port range: sysctl -w net.ipv4.ip_local_port_range="1024 65535"

java.net.ConnectException: Connection timed out
→ Could be client-side OR server-side. Check both. If your CPU is at
  100% on the load gen machine, it is client-side.

# RULE: Always monitor your load generator machine.
# If its CPU > 80% or memory > 85%, your results are unreliable.
# Distribute the load across multiple machines instead.

Error Analysis Workflow

Systematic Error Analysis

Note the total error count and error percentage from the Dashboard.

Open the Errors Over Time chart. Identify WHEN errors started -- correlate with the user ramp-up.

Open the response data for failed requests. Group errors by HTTP status code.

For each error type, check if it affected all endpoints or just one specific endpoint.

Cross-reference the error timestamps with server logs (application logs, access logs).

Check your load generator machine's resource usage to rule out client-side errors.

Document: "Error type [X] started at [Y] concurrent users, affecting endpoint [Z], caused by [root cause from logs]."

Q: During a load test, you see a 5% error rate with all errors being 502 Bad Gateway. The application logs show no errors. What is happening?

A: A 502 Bad Gateway means the reverse proxy (nginx, Apache, or a load balancer like ALB) successfully received the request but could not get a response from the upstream application server. The fact that application logs show no errors tells me the request never reached the application -- it died between the proxy and the app server. There are several possibilities: the app server process crashed and restarted (check process uptime and restart logs), the app server's request queue was full and it stopped accepting connections (check max worker/thread settings), or the app server was so slow that the proxy timed out before getting a response (check proxy timeout settings). I would check the reverse proxy logs (nginx error.log or ALB access logs) for the exact upstream error message, and also check if the application server was still running at the timestamp of the errors.

Key Point: Categorize errors by type (HTTP code), timing (when they started), and scope (which endpoints). A 2% error rate means nothing without context -- 2% connection timeouts on checkout during peak load is a very different problem from 2% 404s on a broken image link.

Key Point: Error analysis requires categorization by HTTP code, timing correlation with load increase, and distinguishing server errors from load generator errors.

Previous Up NextIdentifying CPU and Memory Bottlenecks

Chapter 8: Analyzing Results and Bottleneck Identification

Error Analysis and Categorization

Prev Next

HTTP Error Categories Under Load

Error Type	HTTP Code	What It Usually Means Under Load	Investigation Path
Connection Refused	N/A (no response)	Server cannot accept more connections	Check max connections config, OS file descriptor limits
Connection Timeout	N/A (timeout)	Server too busy to respond in time	Check thread pool size, request queue depth
500 Internal Server Error	500	Application exception -- null pointer, OOM, unhandled edge case	Check application logs for stack traces at the error timestamp
502 Bad Gateway	502	Reverse proxy (nginx/ALB) cannot reach the app server	App server crashed or is too slow; check if process is alive
503 Service Unavailable	503	Server is intentionally rejecting -- overloaded or in maintenance	Check rate limiters, circuit breakers, health checks
504 Gateway Timeout	504	Reverse proxy timed out waiting for app server	App is alive but too slow; check downstream dependencies (DB, APIs)
429 Too Many Requests	429	Rate limiter kicked in	This might be expected behavior -- check if rate limits match your test plan

The Error Timeline -- When Errors Appear Matters

Errors from the start -- Misconfiguration. Wrong URL, bad auth token, server not running. Not a performance problem; fix and re-run.
Errors appearing gradually as load increases -- Server is reaching capacity. The system is struggling but not dead. This is the useful data point -- it tells you the capacity boundary.
Sudden burst of errors at a specific timestamp -- Resource exhaustion event. A connection pool ran dry, memory filled up, or a downstream service stopped responding.
Errors appearing only after extended runtime (constant load) -- Resource leak. Memory leak, connection leak, file handle leak. The system works at first but degrades over time.
Intermittent errors throughout the test -- Flaky behavior. Could be load-balancer issues, DNS resolution problems, or non-deterministic application bugs that surface under concurrency.

Differentiating Tool Errors from Real Errors

Common Load Generator Errors (Not Server Errors)text

# These errors are from YOUR machine, not the server:

java.net.SocketException: Too many open files
→ Increase OS file descriptor limit: ulimit -n 65535

java.lang.OutOfMemoryError: Java heap space
→ Increase JMeter heap: set HEAP=-Xms2g -Xmx4g in jmeter.bat/jmeter.sh

java.net.BindException: Address already in use
→ Your machine ran out of ephemeral ports. Reduce users per machine or
  increase port range: sysctl -w net.ipv4.ip_local_port_range="1024 65535"

java.net.ConnectException: Connection timed out
→ Could be client-side OR server-side. Check both. If your CPU is at
  100% on the load gen machine, it is client-side.

# RULE: Always monitor your load generator machine.
# If its CPU > 80% or memory > 85%, your results are unreliable.
# Distribute the load across multiple machines instead.