Java Selenium TestNG CI/CD API Testing Database Playwright Performance

Chapter 8: Analyzing Results and Bottleneck Identification

Identifying CPU and Memory Bottlenecks

Your performance test report tells you WHAT is slow. But it does not tell you WHY. To find the root cause, you need to look at the server -- specifically its CPU, memory, disk, and network. Think of it like this: the load test report is the patient's symptoms ("I have a headache and my vision is blurry"), and server monitoring is the blood test that reveals the actual disease ("your blood sugar is dangerously high"). You cannot prescribe treatment based on symptoms alone.

CPU Bottlenecks -- When the Brain Overheats

A CPU bottleneck means the processor cannot keep up with the work being asked of it. The classic symptom is a linear relationship between response time and user count -- double the users, double the response time. The server is simply doing too much computation per request.

CPU Metric	Healthy	Warning	Critical
Overall CPU Usage	< 70%	70-85%	> 85% sustained
User CPU (application work)	< 60%	60-80%	> 80%
System CPU (OS kernel work)	< 15%	15-30%	> 30% (context switching)
IO Wait	< 10%	10-25%	> 25% (disk is the bottleneck, not CPU)
Load Average (Linux)	< number of cores	1-2x cores	> 2x cores

Monitoring CPU During a Load Testbash

# Real-time CPU monitoring (run on the server during the test)

# Quick overview -- updates every 2 seconds
top -b -n 1 | head -20

# Detailed CPU breakdown (user, system, iowait, idle)
mpstat -P ALL 5
# Watch for:
#   %usr > 80  → Application is CPU-bound
#   %sys > 30  → Too many context switches (too many threads)
#   %iowait > 25 → Disk is slow, CPU is waiting

# Per-process CPU usage (find the hungry process)
ps aux --sort=-%cpu | head -10

# Thread-level CPU usage for a Java app (PID = 12345)
top -H -p 12345
# This shows which THREAD is consuming CPU
# Convert the thread ID (LWP) to hex for matching with thread dumps

# Save CPU data to a file for post-test analysis
sar -u 5 > cpu_during_test.log &
# This logs CPU usage every 5 seconds in the background

Common CPU Bottleneck Causes

Inefficient algorithms -- O(n^2) loops processing large datasets per request. Solution: optimize the code.
Excessive serialization/deserialization -- Converting large JSON or XML payloads every request. Solution: optimize payload size, use binary formats.
Regex backtracking -- Complex regular expressions on user input can cause catastrophic backtracking. Solution: simplify regex, set timeout limits.
Too many threads -- Counter-intuitive, but too many threads cause context-switching overhead. If system CPU (%sys) is high, you may have too many threads competing. Solution: reduce thread pool size.
Missing caching -- Computing the same result for every request instead of caching it. Solution: add application-level caching (Redis, in-memory cache).
Synchronous encryption/hashing -- Heavy crypto operations on the main thread. Solution: offload to dedicated workers or hardware acceleration.

Memory Bottlenecks -- When the Server Forgets to Clean Up

Memory bottlenecks come in two flavors: the system runs out of memory (OOM), or the garbage collector works so hard to reclaim memory that it stalls the application (GC thrashing). The sneakiest variant is a memory leak -- the application works fine for hours, then suddenly crashes. In soak tests, this is the number one thing you are looking for.

Spotting Memory Leaks

A memory leak in a performance test looks like a slowly rising line on the memory usage chart. Under constant load, memory should stabilize -- the app allocates objects, GC cleans them up, memory stays flat. If memory keeps climbing without stabilizing, objects are being allocated but never released. The test might run fine for 30 minutes, then the GC pauses get longer and longer as it desperately tries to free memory, response times spike, and eventually the JVM throws an OutOfMemoryError.

Memory Usage Patterns

Healthy

Sawtooth pattern: memory rises, GC drops it back, stable baseline

→

Slow Leak

Each GC cycle drops less. Baseline slowly climbs over hours.

→

Fast Leak

Memory rises steadily. GC barely helps. OOM in minutes.

→

GC Thrashing

Memory near max, GC running constantly, app barely processing.

Monitoring Memory During a Load Testbash

# Overall system memory
free -m -s 5
# Watch "available" column -- this is what the OS can allocate
# If "available" approaches 0, you are in trouble

# Per-process memory usage
ps aux --sort=-%mem | head -10

# Java-specific: Monitor GC activity
# Add these JVM flags to the application:
# -verbose:gc -Xlog:gc*:file=gc.log:time,uptime:filecount=5,filesize=20m

# Watch GC logs for:
# - Increasing GC frequency (GC running every second instead of every 30s)
# - Increasing GC pause times (from 50ms to 500ms to 2000ms)
# - Full GC events (stop-the-world pauses that freeze the entire app)

# Java: Get heap dump for memory leak analysis
# (This pauses the app -- do it on staging, not production)
jmap -dump:format=b,file=heap_dump.hprof <PID>
# Then analyze with Eclipse MAT or VisualVM

# Linux: Monitor for OOM killer activity
dmesg | grep -i "out of memory"
dmesg | grep -i "killed process"

Memory Symptom	Likely Cause	Solution
Memory grows during load, stabilizes when load stops	Normal -- objects created during request processing	No action needed if it stabilizes within acceptable limits
Memory grows during load and NEVER drops back	Memory leak -- objects are retained after requests complete	Heap dump analysis. Check for static collections, unclosed resources, event listener leaks
GC pauses getting longer over time	Heap is filling up, GC working harder to find reclaimable objects	Fix the leak, or increase heap size as a short-term workaround
Sudden OOM crash after hours of stable running	Slow memory leak. Soak tests are designed to catch exactly this.	Heap dump analysis before the crash. Enable -XX:+HeapDumpOnOutOfMemoryError
High %sys CPU combined with high memory usage	OS is swapping -- physical RAM is full, using disk as memory	Add RAM, reduce JVM heap size, or fix the memory leak

If you are testing a Java application, ALWAYS enable GC logging before the test. Without GC logs, you are flying blind. A 2-second GC pause looks exactly like a slow database query in your JMeter report -- you cannot tell the difference from the client side. GC logs let you pinpoint: "The response time spike at 14:32:05 coincides with a Full GC that paused the JVM for 1.8 seconds."

Q: How would you identify a memory leak during a soak test?

A: I would monitor three things during the soak test: application memory usage over time, GC behavior, and response time trends. A memory leak manifests as a steadily increasing memory baseline -- after each garbage collection cycle, the memory drops back to a slightly higher level than before. Over hours, this accumulates. I would look for response time degradation that correlates with GC activity -- as the heap fills up, GC runs more frequently and takes longer, causing periodic latency spikes. If I confirm a leak, I would capture a heap dump before the application crashes (using jmap or enabling HeapDumpOnOutOfMemoryError). I would then analyze it with Eclipse MAT to find the retained objects -- usually it is something like an unbounded cache, unclosed database connections, or event listeners that are never unregistered. The key metric is: under constant load, if memory usage after GC keeps increasing over time, that is a confirmed leak.

Key Point: CPU bottlenecks cause linear response time growth with user count. Memory bottlenecks cause gradual degradation over time under constant load. Always monitor both during performance tests -- your JMeter/Gatling report cannot distinguish between the two.

Key Point: CPU bottlenecks show linear degradation with user count; memory leaks show gradual degradation over time under constant load. GC logging is essential for Java applications.

Previous Up NextIdentifying Database and Network Bottlenecks