Debugging Regex Patterns
Regex Debugging Strategies
1. Incremental Pattern Development
import re
def test_regex(pattern, test_strings):
regex = re.compile(pattern)
for string in test_strings:
print(f"Pattern: {pattern}")
print(f"String: {string}")
print(f"Matches: {regex.findall(string)}\n")
## Incremental testing
test_strings = ['hello world', 'hello123', 'h3llo']
test_regex(r'hello', test_strings)
test_regex(r'hello\d+', test_strings)
2. Visualization of Regex Matching
graph TD
A[Input String] --> B{Regex Pattern}
B -->|Match| C[Successful Match]
B -->|No Match| D[Debugging Required]
C --> E[Extract Matched Portions]
D --> F[Analyze Pattern]
Debugging Techniques
Pattern Breakdown Method
Technique |
Description |
Example |
Simplification |
Reduce pattern complexity |
\w+@\w+\.\w+ â \w+ |
Incremental Testing |
Add complexity gradually |
Start simple, add constraints |
Verbose Mode |
Improve readability |
Use re.VERBOSE flag |
import re
def advanced_regex_debug(pattern, text):
try:
## Compile with verbose mode
regex = re.compile(pattern, re.VERBOSE)
## Detailed matching information
match = regex.search(text)
if match:
print(f"Full Match: {match.group()}")
print(f"Match Start: {match.start()}")
print(f"Match End: {match.end()}")
else:
print("No match found")
except re.error as e:
print(f"Regex Compilation Error: {e}")
## Example usage
text = "Contact email: [email protected]"
pattern = r'''
\b ## Word boundary
[a-zA-Z0-9]+ ## Username
@ ## @ symbol
[a-zA-Z0-9]+ ## Domain name
\. ## Dot
[a-zA-Z]{2,} ## Top-level domain
\b ## Word boundary
'''
advanced_regex_debug(pattern, text)
Common Debugging Scenarios
1. Greedy vs. Non-Greedy Matching
text = "The quick brown fox"
## Greedy matching
greedy_pattern = r'q.*x'
print(re.findall(greedy_pattern, text))
## Non-greedy matching
non_greedy_pattern = r'q.*?x'
print(re.findall(non_greedy_pattern, text))
2. Lookahead and Lookbehind Assertions
def test_assertions(pattern, text):
matches = re.findall(pattern, text)
print(f"Matches: {matches}")
text = "price: $50, another: $30"
## Positive lookahead
test_assertions(r'\$\d+(?=\s)', text)
## Negative lookbehind
test_assertions(r'(?<!\$)\d+', text)
LabEx Debugging Recommendations
- Use interactive regex testers
- Break complex patterns into smaller parts
- Leverage Python's
re
module flags
- Practice systematic debugging approach
Key Takeaways
- Regex debugging requires patience and methodical approach
- Use incremental testing
- Understand pattern matching mechanisms
- Leverage Python's regex debugging tools