| Metric | Use For | Healthy Value |
|---|---|---|
| p95 Response Time | SLA pass/fail | < 2s pages, < 500ms APIs |
| Throughput (RPS) | Capacity planning | Matches expected traffic |
| Error Rate | Reliability check | < 0.1% under normal load |
| CPU Usage | Server bottleneck | < 70% under load |
| Memory Usage | Leak detection | Stable, not growing |
| DB Connections | Pool exhaustion | < 80% of max pool |
Q: What metrics do you track during a performance test and what are acceptable thresholds?
A: I track four primary metrics: (1) p95 response time -- under 2 seconds for pages, under 500ms for APIs. (2) Throughput -- must meet or exceed expected traffic, calculated via Little's Law. (3) Error rate -- under 0.1% for load tests, near zero for financial operations. (4) Concurrent users -- the test simulates expected peak traffic. I also monitor server-side metrics: CPU under 70%, stable memory (no growth indicating leaks), database connections under 80% of pool max. I always use percentiles over averages because averages hide tail latency. The p95-p99 gap is diagnostic -- a large gap indicates GC pauses, cache misses, or connection pool waits.
Key Point: Master these metrics and you can read any performance test report in any tool. p95 response time, throughput, error rate, and server utilization -- four categories that answer every performance question.
Key Point: Four metric categories answer every performance question: response time (percentiles), throughput (capacity), errors (reliability), and server resources (root cause)
Answer all 5 questions, then submit to see your score.
1. Why are percentiles preferred over averages for measuring response time?
2. What does Little's Law state?
3. What happens when you run a load test without think time?
4. If CPU usage is low but response time is high during a load test, what is the likely bottleneck?
5. What is an acceptable error rate during a load test with expected traffic?