You do not just jump straight to 3,000 users and see what happens. That is like a new gym member loading 200 kg on the squat rack on day one. You build up methodically. Each test type serves a specific purpose, and the order matters. The results from one test inform the next. Skip a step and you miss critical information.
The baseline test is your reference point. Run with minimal load (10-100 users) to establish what "healthy" looks like. If the app is slow with 10 users, there is no point testing with 3,000. Fix the fundamentals first. The baseline also validates your test scripts -- if correlation is broken or data is missing, you catch it here instead of wasting a 45-minute load test.
| Parameter | Baseline Test Configuration |
|---|---|
| Virtual Users | 10-100 (just enough to validate the setup) |
| Ramp-up | 1 minute |
| Duration | 10-15 minutes |
| Think Time | Same as load test (keep it realistic) |
| What to Check | All requests succeed, correlation works, response times are reasonable, no script errors |
| Pass Criterion | 0% errors, response times within expected single-user range |
| Action if Failed | Fix scripts, fix environment, do NOT proceed to load test |
The load test is the main event. This is where you simulate your target concurrent users for a sustained period and compare results against your acceptance criteria. If the load test passes, your application can handle expected production traffic. This is the test you run most often -- after every significant release.
import http from "k6/http";
import { check, sleep } from "k6";
export const options = {
// Phase 2: Load Test Configuration
stages: [
{ duration: "10m", target: 3000 }, // Ramp up to 3,000 users over 10 min
{ duration: "30m", target: 3000 }, // Hold at 3,000 for 30 min (steady state)
{ duration: "5m", target: 0 }, // Ramp down over 5 min
],
thresholds: {
// Acceptance criteria -- test FAILS if any threshold is breached
"http_req_duration{name:login}": ["p(95)<1500"],
"http_req_duration{name:balance}": ["p(95)<800"],
"http_req_duration{name:transfer}": ["p(95)<2000"],
"http_req_failed": ["rate<0.001"], // < 0.1% errors
"http_reqs": ["rate>300"], // > 300 TPS
},
};
export default function () {
// Login
const loginRes = http.post(
"https://test.banking.app/api/auth/login",
JSON.stringify({ username: __VU_USERNAME, password: "Test@1234" }),
{ tags: { name: "login" } }
);
check(loginRes, {
"login returns 200": (r) => r.status === 200,
"login has token": (r) => r.json("token") !== undefined,
});
sleep(randomIntBetween(5, 12)); // Realistic think time
// Check balance
const balanceRes = http.get(
"https://test.banking.app/api/accounts/balance",
{ headers: { Authorization: `Bearer ${loginRes.json("token")}` },
tags: { name: "balance" } }
);
check(balanceRes, {
"balance returns 200": (r) => r.status === 200,
});
sleep(randomIntBetween(5, 12));
}The stress test goes beyond your target load to find the breaking point. Where does the system start degrading? Where does it fail completely? This is not about passing or failing -- it is about discovering limits. You typically go 1.5x to 3x beyond target load. If your target is 3,000 users, stress test at 4,500 and 6,000.
Degradation Point -- The user count where p95 response time starts exceeding SLA. Example: "At 4,200 users, login p95 jumped from 1.2s to 3.8s." This is your capacity ceiling.
Error Threshold -- The user count where error rate exceeds the acceptable limit. Example: "At 5,100 users, error rate jumped from 0.05% to 4.2%, mostly HTTP 503 (connection pool exhausted)."
Resource Saturation -- Which resource maxes out first? CPU? Memory? Database connections? Disk I/O? This tells you what to scale or optimize.
Recovery Behavior -- After load decreases, does the system recover to normal? Some systems get stuck in a degraded state (thread pool exhaustion, connection leak) even after load drops.
Cascading Failures -- Does one component failure cause others to fail? A slow database response can back up the connection pool, which backs up the thread pool, which causes timeouts upstream.
A spike test simulates a sudden burst of users -- going from 500 to 5,000 in 30 seconds. This tests auto-scaling, connection pool behavior, cache warming, and queue depth handling. Real-world spikes happen during flash sales, breaking news, marketing email blasts, or when a competitor goes down and their users flood your site.
export const options = {
stages: [
{ duration: "5m", target: 500 }, // Normal load
{ duration: "30s", target: 5000 }, // SPIKE: 10x in 30 seconds
{ duration: "5m", target: 5000 }, // Hold spike for 5 minutes
{ duration: "30s", target: 500 }, // Drop back to normal
{ duration: "5m", target: 500 }, // Recovery period -- does it stabilize?
{ duration: "2m", target: 0 }, // Ramp down
],
thresholds: {
// During spike, we accept degraded performance
// The key metric is: does it RECOVER?
"http_req_failed": ["rate<0.05"], // < 5% errors even during spike
},
};
// Key questions this test answers:
// 1. Does auto-scaling kick in fast enough?
// 2. How many errors during the spike transition?
// 3. Does the system recover after the spike subsides?
// 4. Are there connection pool exhaustion or thread starvation issues?The soak test runs at normal load but for an extended period -- typically 4 to 12 hours. Its purpose is to find issues that only appear over time: memory leaks, connection leaks, log file disk space exhaustion, database connection pool drift, and session accumulation. A test that passes in 30 minutes can fail at hour 6 because of a memory leak growing at 50 MB per hour.
| Issue Found in Soak Tests | Symptom | Root Cause | How to Detect |
|---|---|---|---|
| Memory Leak | Heap usage climbs steadily, GC becomes more frequent | Objects not released, caches without TTL | Grafana: JVM heap usage graph shows upward trend |
| Connection Leak | Database connections exhausted after hours | Connections not returned to pool in error paths | Monitor DB connection pool: active count climbs, never drops |
| Log Rotation Failure | Disk fills up, app crashes | Log files not rotated, debug logging left on | Monitor disk usage: /var/log growing without bounds |
| Session Accumulation | Memory grows as sessions are never cleaned | Session timeout too long or cleanup job disabled | Monitor session count: should plateau, not climb |
| Thread Leak | Thread count climbs, eventually hits OS limit | Threads created but never terminated | Monitor JVM thread count: should be stable after warm-up |
Q: Walk me through your performance test execution strategy. What types of tests do you run and in what order?
A: I follow a five-phase approach. First, a baseline test with 10-100 users to validate scripts, environment, and establish reference response times. Second, a load test at the target user count (say 3,000) for 30-45 minutes to validate against acceptance criteria. Third, a stress test at 1.5x to 2x target to find the breaking point -- I am looking for the user count where p95 degrades beyond SLA and which resource saturates first. Fourth, a spike test -- normal load with a sudden 10x burst for 5 minutes to test auto-scaling and recovery. Fifth, a soak test at normal load for 4-8 hours to detect memory leaks, connection leaks, and other time-dependent issues. Each phase builds on the previous one. If the baseline fails, I fix the environment before proceeding. If the load test fails, there is no point running a stress test. I document findings from each phase and update the test plan before the next phase.
Always run the soak test over a weekend or overnight when the test environment is not needed for anything else. A 4-hour soak test during business hours blocks other teams from using the environment. Start it Friday evening, let it run overnight, and analyze results Monday morning.
Key Point: Execute tests in order: baseline (validate setup) then load (validate SLAs) then stress (find limits) then spike (test recovery) then soak (find leaks). Each phase builds on the previous one -- never skip ahead.