Chapter 5: Correlation and Parameterization
The Regular Expression Extractor is the oldest and most versatile extraction tool in JMeter. It works on any text format -- HTML, JSON, XML, plain text, even response headers. It is like a Swiss Army knife: not always the most elegant tool, but it will get the job done when nothing else can. And in interviews, this is the extractor they ask about most.
When you add a Regular Expression Extractor (right-click sampler > Add > Post Processors > Regular Expression Extractor), you see several fields. Let me break down each one because getting any of them wrong means no extraction.
| Field | What It Does | Example | Common Mistakes |
|---|---|---|---|
| Apply to | Which part of the response to search | Main sample only (most common) | Leaving on "Main sample and sub-samples" causes duplicate matches |
| Field to check | Body, Headers, URL, Response Code, or Response Message | Body (for HTML/JSON), Response Headers (for cookies) | Searching Body when the value is in a header |
| Name of created variable | The variable name to store the result | csrf_token | Using spaces or special characters in the name |
| Regular Expression | The regex pattern with capture groups | name="_csrf" value="(.+?)" | Forgetting the capture group parentheses |
| Template | Which capture group to use | $1$ (first group) | Using $0$ which returns the entire match, not the group |
| Match No. | Which occurrence to use (1=first, 0=random, -1=all) | 1 | Using 0 (random) in deterministic flows |
| Default Value | Value if no match found | NOT_FOUND | Leaving empty -- makes debugging hard |
You do not need to be a regex wizard. For correlation, you only need a handful of patterns. The key concept is the capture group -- the part inside parentheses () is what gets extracted. Everything else is the anchor that helps JMeter find the right spot.
# Pattern: Left boundary + Capture group + Right boundary
# Format: left_text(.+?)right_text
# The (.+?) captures everything between boundaries
# The ? makes it non-greedy (stops at first match)
# 1. Extract CSRF token from HTML hidden field
# Response: <input type="hidden" name="_csrf" value="aB3dEf7Gh" />
# Regex: name="_csrf" value="(.+?)"
# Result: aB3dEf7Gh
# 2. Extract session ID from cookie header
# Header: Set-Cookie: JSESSIONID=ABC123DEF456; Path=/; HttpOnly
# Regex: JSESSIONID=(.+?);
# Result: ABC123DEF456
# 3. Extract auth token from JSON (when JSON Extractor is not available)
# Response: {"token":"eyJhbGciOiJIUzI1NiJ9.abc.xyz","expires":3600}
# Regex: "token":"(.+?)"
# Result: eyJhbGciOiJIUzI1NiJ9.abc.xyz
# 4. Extract order ID from URL in redirect
# Response: Location: /orders/ORD-98765/confirmation
# Regex: /orders/(.+?)/confirmation
# Result: ORD-98765
# 5. Extract ViewState from ASP.NET page
# Response: <input type="hidden" name="__VIEWSTATE" value="LONG_BASE64_STRING" />
# Regex: name="__VIEWSTATE" value="(.+?)"
# Result: LONG_BASE64_STRING
# 6. Extract multiple values with multiple groups
# Response: "userId":42,"userName":"john_doe"
# Regex: "userId":(\d+),"userName":"(.+?)"
# Template: $1$ gives 42, $2$ gives john_doeThe Template field tells JMeter which capture group to return. If your regex has one set of parentheses, use $1$. If it has two, you can use $1$ for the first or $2$ for the second. You can even combine them: $1$_$2$ would give you "42_john_doe" from the example above. The $0$ returns the entire matched string including the boundaries, which you almost never want.
What if the same pattern appears multiple times in the response? Match No. controls which one to grab. This is common when you have a list of items and need a specific one.
# Response contains a list of product IDs:
# <a href="/products/P001">Widget</a>
# <a href="/products/P002">Gadget</a>
# <a href="/products/P003">Doohickey</a>
# Regex: /products/(P\d+)"
# Match No.: -1
# Variable Name: product_id
# This creates:
# ${product_id_1} = P001
# ${product_id_2} = P002
# ${product_id_3} = P003
# ${product_id_matchNr} = 3
# Now you can loop through them with a ForEach Controller
# or pick a random one with ${__V(product_id_${__Random(1,${product_id_matchNr})})}When your regex is not matching, here is a systematic debugging approach. I call this the "shrink and expand" method -- start with a pattern you KNOW matches, then gradually add specificity.
Check placement: Is the extractor a child of the correct sampler?
Check "Field to check": Are you searching Body when the value is in Headers?
Check the actual response: Open View Results Tree, find your request, look at Response Body/Headers tab -- is the value actually there?
Simplify the regex: Start with just (.+) to match everything, then gradually narrow down
Check for encoding: HTML entities like ' instead of apostrophes, URL-encoded values like %3D instead of =
Add a Debug Sampler after your request and run again -- it shows all variables and their values in View Results Tree
Set a meaningful Default Value like "NOT_FOUND" -- if you see NOT_FOUND in subsequent requests, you know extraction failed
Do NOT use .* (greedy) when you mean .+? (non-greedy). The greedy pattern grabs everything until the LAST match on the page, not the first. This is the number one regex mistake in JMeter. Always use (.+?) or (.*?) for correlation.
# Response:
# <input name="_csrf" value="token1" /> ... <input name="other" value="token2" />
# WRONG (greedy): value="(.+)"
# Captures: token1" /> ... <input name="other" value="token2
# (grabs everything between FIRST value=" and LAST ")
# CORRECT (non-greedy): value="(.+?)"
# Captures: token1
# (stops at the FIRST closing quote)Q: What is the difference between .* and .*? in JMeter regex, and which should you use for correlation?
A: The .* is a greedy quantifier -- it matches as much text as possible. The .*? (with the question mark) is non-greedy or lazy -- it matches as little text as possible. For correlation, you should always use the non-greedy version (.*? or .+?) because you want to capture just the value between your boundaries, not everything from the first boundary to the last occurrence of the second boundary. Using greedy matching in correlation is a common mistake that leads to capturing far more text than intended.
Always set a Default Value in the Regular Expression Extractor. I use "CORRELATION_FAILED_<variable_name>" as my default. When a subsequent request fails, seeing "CORRELATION_FAILED_csrf_token" in the request body tells me instantly which extraction broke, without digging through logs.
Key Point: The Regular Expression Extractor uses capture groups (.+?) to extract values from any response format. Always use non-greedy patterns, set meaningful defaults, and place the extractor as a child of the source sampler.