A system that responds in 50ms but returns errors for 10% of requests is worse than one that responds in 500ms with zero errors. Error rate measures reliability -- the percentage of requests that fail. Under normal load, error rate should be near zero. As load increases, errors should still stay below your threshold. When errors spike, you have found either the breaking point or a bug.
| Error Type | HTTP Code | What It Means | Common Cause |
|---|---|---|---|
| Server Error | 500 | Server crashed or threw an exception | Unhandled error under load, out of memory |
| Service Unavailable | 503 | Server is overloaded or down | All workers busy, auto-scaling not fast enough |
| Gateway Timeout | 504 | Upstream server did not respond in time | Backend service too slow under load |
| Too Many Requests | 429 | Rate limiting activated | Legitimate protection -- your test is hitting rate limits |
| Connection Timeout | N/A | Could not establish a connection | Connection pool exhausted, network issue |
| Socket Timeout | N/A | Connected but response never came | Server hung, deadlock, infinite loop |
What is an acceptable error rate? It depends on the test type and the operation. Financial transactions should have near-zero errors. Product browsing can tolerate a small percentage.
| Context | Acceptable Error Rate | Why |
|---|---|---|
| Load test (expected users) | < 0.1% | Under normal load, the system should be reliable |
| Stress test (above expected) | < 5% at 2x load | Some degradation is expected beyond capacity |
| Financial transactions | < 0.01% (near zero) | Lost transactions mean lost money and compliance issues |
| Content browsing | < 1% | A failed page load is annoying but not catastrophic |
| File uploads | < 0.5% | Users will retry but too many failures erode trust |
If your performance test shows 429 (Too Many Requests) errors, stop and check if the target system has rate limiting. Testing against rate-limited endpoints gives meaningless results -- you are testing the rate limiter, not the application. Either disable rate limiting in the test environment or adjust your test to stay within limits.
Q: What error rate is acceptable in performance testing and how do you investigate errors?
A: Under expected load, error rate should be below 0.1% -- near zero. During stress testing, up to 5% at 2x expected load is acceptable. For financial transactions, near zero is mandatory. When I see errors, I categorize them: 500 errors indicate application crashes (check logs for exceptions), 503 means server overload (check worker/thread pool metrics), 504 means upstream timeout (identify slow dependency), 429 means rate limiting (adjust test or environment). I also look at error patterns -- sudden spike at a user count means capacity limit; gradual increase means resource leak; errors on specific endpoints means targeted bottleneck.
Key Point: Error rate should be near zero under expected load. Categorize errors by HTTP code, analyze patterns (sudden vs gradual vs endpoint-specific), and distinguish between application bugs and capacity limits.
Key Point: Error rate should be <0.1% under normal load. Categorize errors by HTTP code and analyze patterns to find root causes.