Java Selenium TestNG CI/CD API Testing Database Playwright Performance

Chapter 8: Analyzing Results and Bottleneck Identification

APM Tools and Profiling Under Load

So far we have been doing detective work with server metrics and log files. That is like solving a crime by interviewing witnesses and checking security cameras. APM (Application Performance Monitoring) tools are like having a body cam on every suspect -- they instrument your application code and trace every request from entry to exit, showing exactly where time is spent. If you have ever wished you could see inside a request and know "200ms was spent in the database query, 50ms in JSON serialization, and 800ms waiting for the payment gateway", APM tools do exactly that.

What APM Tools Show You

An APM tool instruments your application (usually via an agent/library you add to the runtime) and collects detailed traces. Each trace represents a single request flowing through your system, broken into "spans" -- one span for the controller method, one for the database query, one for the external HTTP call, and so on.

APM Tool	Best For	Key Feature	Pricing Model
New Relic	Full-stack visibility, enterprise teams	Distributed tracing + infrastructure monitoring in one UI	Per-host or per-GB ingested, free tier available
Datadog	Cloud-native apps, DevOps teams	Excellent dashboards, broad integration ecosystem	Per-host + per-feature pricing, free tier for basic APM
Grafana + Prometheus + Tempo	Open-source enthusiasts, cost-conscious teams	Fully open-source stack, self-hosted, no vendor lock-in	Free (self-hosted). Grafana Cloud has paid tiers.
Dynatrace	Large enterprises, auto-discovery	AI-powered root cause analysis (Davis AI)	Per-host, typically expensive
Jaeger	Distributed tracing only	Open-source, CNCF project, Kubernetes-native	Free (self-hosted)
AppDynamics	Enterprise Java/.NET shops	Business transaction mapping, auto-baseline	Per-agent, enterprise pricing

Using APM During Performance Tests

The real power of APM emerges when you combine it with load testing. Run your JMeter/Gatling test, then open your APM tool and look at the same time window. You will see things that server metrics alone cannot reveal: which specific code methods are slow, which database queries are called most often, which external services add latency, and whether the problem is in your code or in a dependency.

APM + Load Test Workflow

Ensure APM agent is installed and configured on the application server. Verify it is reporting data.

Note the exact start time of your load test. You will use this to filter the APM data.

Run your load test as normal (JMeter/Gatling).

After the test, open the APM tool and set the time window to match your test duration.

Go to the "Transactions" or "Services" view. Sort by response time or error rate.

Click on the slowest transaction. Look at the trace breakdown -- which spans took the most time?

Check the "Database" view for slow queries. The APM will show the actual SQL, execution time, and call count.

Check the "External Services" view to see if any downstream API calls are slow.

Correlate APM findings with your JMeter/Gatling report. The slow endpoints should match.

Export or screenshot the trace waterfall for your report to stakeholders.

Grafana Dashboards for Performance Testing

If your team uses the open-source stack (Grafana + Prometheus + Tempo), you can build custom dashboards that combine your load test metrics with server metrics in one view. This is incredibly powerful for real-time monitoring during a test. You can literally watch your server metrics react to the load test traffic in real time.

Essential Grafana Dashboard Panels for Load Teststext

# Panel 1: Application Response Time (from APM/Prometheus)
# PromQL: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Panel 2: Request Rate (throughput)
# PromQL: rate(http_requests_total[5m])

# Panel 3: Error Rate
# PromQL: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])

# Panel 4: CPU Usage
# PromQL: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Panel 5: Memory Usage
# PromQL: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

# Panel 6: DB Connection Pool (if using HikariCP with micrometer)
# PromQL: hikaricp_connections_active

# Panel 7: JVM Heap Usage
# PromQL: jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} * 100

# Panel 8: GC Pause Duration
# PromQL: rate(jvm_gc_pause_seconds_sum[5m])

# Arrange these 8 panels in a 4x2 grid. During the load test,
# have this dashboard open on a second monitor. You will see
# the exact moment the system starts struggling.

Profiling Under Load -- The Deep Dive

Profiling is the nuclear option of performance analysis. It instruments every method call in your application and shows exactly where CPU time is spent. Profiling has significant overhead (10-30% performance impact), so you never do it in production. But on a staging environment during a load test, it is invaluable. A flame graph from a profiler during a load test is the most precise diagnostic tool in your arsenal.

CPU Profilers (async-profiler, VisualVM, YourKit) -- Show which methods consume the most CPU. The output is typically a flame graph where the width of each bar represents CPU time. Wide bars at the bottom are your hotspots.
Memory Profilers (Eclipse MAT, VisualVM) -- Show which objects consume the most heap memory. Useful for memory leak diagnosis. You capture a heap dump and analyze the retained size of object trees.
Thread Profilers (jstack, VisualVM) -- Show what each thread is doing at a point in time. Useful for diagnosing lock contention and deadlocks. Look for many threads in BLOCKED or WAITING state.
Continuous Profiling (Pyroscope, Parca) -- Always-on profiling with minimal overhead. Lets you compare flame graphs from different time periods -- "why was the app slower at 2pm than at 10am?"

If you only learn one profiling skill, learn to read flame graphs. A flame graph shows the call stack on the Y-axis and CPU time on the X-axis. Each box is a method. The wider the box, the more CPU it uses. Look at the widest boxes near the top of the stack -- those are your application methods where time is spent. Boxes at the bottom are framework/library code you usually cannot change.

Q: What APM tools have you used, and how did they help you during performance testing?

A: I have worked with New Relic and the Grafana-Prometheus stack. During performance testing, I use APM tools to bridge the gap between "what is slow" (from JMeter reports) and "why is it slow" (from server-side visibility). For example, in one project, JMeter showed that the order placement API had a p99 of 8 seconds. The APM trace waterfall revealed that 6 of those 8 seconds were spent in a single database query that was doing a full table scan on a 10-million row table. Without the APM trace, we would have spent days adding log statements and guessing. With Grafana, I build dashboards that combine load test metrics with server metrics so the team can watch the test in real time. This is especially useful during stress tests -- you can see the exact user count where CPU hits 90% or where the connection pool gets exhausted.

Key Point: APM tools trace individual requests through your system, showing exactly where time is spent. Combine APM traces with load test results to go from "which endpoint is slow" to "which line of code or SQL query is slow" in minutes.

Key Point: APM tools trace requests through the system and show exactly where time is spent. Use them alongside load tests to bridge the gap between symptoms (slow responses) and root cause (specific code or queries).

Previous Up NextRoot Cause Analysis Workflow