In distributed testing, the network becomes your most important -- and most overlooked -- infrastructure component. I have seen teams spend weeks perfecting their test scripts only to get garbage results because they did not account for network topology. Let me share the pitfalls and the math.
In distributed testing, there are two separate networks to worry about. First, the control plane: the RMI communication between master and workers (test plan distribution, result streaming). Second, the data plane: the actual test traffic from workers to the target application. These two should ideally be on separate network paths so they do not compete for bandwidth.
Here is a quick formula that can save you from embarrassing failures. If each virtual user makes 5 requests per second, and the average response size is 50 KB, then each user consumes 250 KB/s of download bandwidth. For 1,000 users on one worker, that is 250 MB/s -- already exceeding a 1 Gbps NIC (which gives about 125 MB/s actual throughput). You hit the network ceiling before you hit CPU or RAM limits.
| Scenario | Users per Worker | Req/sec per User | Avg Response Size | Bandwidth per Worker | Can 1 Gbps NIC Handle It? |
|---|---|---|---|---|---|
| Light API test | 1,000 | 5 | 2 KB | 10 MB/s | Yes, easily |
| Medium web page test | 500 | 3 | 50 KB | 75 MB/s | Yes, with room to spare |
| Heavy page with images | 300 | 10 | 200 KB | 600 MB/s | No -- NIC saturated at ~125 MB/s |
| File download test | 100 | 1 | 5 MB | 500 MB/s | No -- need 10 Gbps or fewer users |
| Lightweight REST API | 2,000 | 10 | 1 KB | 20 MB/s | Yes |
Per-Worker Bandwidth = Users x Requests/sec x Avg_Response_Size
Example:
- 500 users per worker
- 4 requests per second per user
- Average response size: 30 KB
- Bandwidth = 500 x 4 x 30 KB = 60,000 KB/s = 60 MB/s
- A 1 Gbps NIC handles ~125 MB/s, so this is fine.
Total Test Bandwidth = Per-Worker Bandwidth x Number of Workers
- With 5 workers: 60 x 5 = 300 MB/s hitting target
- Target application must handle this incoming traffic!
Do not forget:
- SSL/TLS adds ~5-10% overhead
- TCP headers add ~40 bytes per packet
- Result streaming adds bandwidth on the control planeIf your workers are on the same LAN as the target application, network latency is sub-millisecond and essentially a non-factor. But if your workers are in a different data center or cloud region, network latency directly adds to every response time measurement. This is not necessarily bad -- it may actually reflect real user experience -- but you need to account for it.
# Increase max open files (for more concurrent connections)
ulimit -n 65536
# Or permanently in /etc/security/limits.conf:
# jmeter_user soft nofile 65536
# jmeter_user hard nofile 65536
# Enable TCP reuse to avoid port exhaustion
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
# Increase local port range
sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# Increase connection backlog
sudo sysctl -w net.core.somaxconn=65535
# Increase network buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216If all your workers are behind a single NAT gateway, the target application will see thousands of connections from one IP address. This can trigger rate limiting, WAF rules, or uneven load balancer distribution. Use unique public IPs per worker, or whitelist your NAT IP with the infrastructure team.
Q: What network considerations are important when planning a distributed load test?
A: There are several critical network considerations: (1) Bandwidth calculation -- multiply users x requests/sec x average response size to ensure each worker NIC can handle the traffic. A 1 Gbps NIC maxes out at ~125 MB/s actual throughput. (2) Network topology -- separate control plane (RMI between master/workers) from data plane (test traffic to target) to prevent competition. (3) Latency -- workers in different regions add network latency to every measurement; place workers in the same region as the target unless testing geographic distribution. (4) OS tuning -- increase file descriptor limits (ulimit -n 65536), enable TCP reuse to avoid ephemeral port exhaustion, and expand the local port range. (5) NAT and firewalls -- workers behind NAT appear as one IP to the target, potentially triggering rate limits. (6) DNS -- ensure consistent DNS resolution across workers or use direct IPs.
Key Point: Calculate bandwidth before testing: Users x Requests/sec x Response_Size per worker. Tune OS network limits. Watch for NAT, port exhaustion, and latency.