Robust Pattern Matching
Advanced Pattern Matching Strategies
Robust pattern matching goes beyond basic regex techniques, focusing on reliability, performance, and comprehensive text processing.
Regex Matching Workflow
graph TD
A[Input Text] --> B{Compile Pattern}
B --> C[Validate Input]
C --> D{Match Strategy}
D --> |Partial| E[Flexible Matching]
D --> |Exact| F[Strict Matching]
D --> |Complex| G[Advanced Techniques]
Matching Mode Comparison
Mode |
Description |
Use Case |
re.IGNORECASE |
Case-insensitive matching |
Text normalization |
re.MULTILINE |
Enable ^ and $ for each line |
Multi-line text processing |
re.DOTALL |
Dot matches newline characters |
Complex text parsing |
Flexible Matching Techniques
import re
def flexible_match(text, patterns):
for pattern in patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
return match.group()
return None
## Example usage
text = "Contact LabEx at [email protected]"
contact_patterns = [
r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b',
r'support\s*@\s*\w+\.\w+'
]
result = flexible_match(text, contact_patterns)
print(result) ## Outputs: [email protected]
import re
import timeit
class OptimizedMatcher:
def __init__(self, patterns):
self.compiled_patterns = [re.compile(p) for p in patterns]
def match(self, text):
for pattern in self.compiled_patterns:
if pattern.search(text):
return True
return False
## Benchmark matching
patterns = [r'\d+', r'[a-zA-Z]+', r'\w+@\w+\.\w+']
matcher = OptimizedMatcher(patterns)
def performance_test():
text = "Hello LabEx 2023 [email protected]"
return matcher.match(text)
execution_time = timeit.timeit(performance_test, number=10000)
print(f"Matching Performance: {execution_time} seconds")
Advanced Parsing Techniques
def extract_structured_data(text):
patterns = {
'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
'phone': r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}',
'url': r'https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+',
}
results = {}
for key, pattern in patterns.items():
matches = re.findall(pattern, text, re.IGNORECASE)
results[key] = matches
return results
## Example usage
sample_text = """
Contact LabEx at [email protected]
Phone: (123) 456-7890
Website: https://www.labex.io
"""
structured_data = extract_structured_data(sample_text)
print(structured_data)
Robust Error Handling
- Use multiple fallback patterns
- Implement comprehensive input validation
- Handle partial and imperfect matches
- Provide meaningful error messages
Complex Pattern Validation
def validate_complex_pattern(text, validators):
for name, validator in validators.items():
if not validator(text):
print(f"Invalid {name}")
return False
return True
## Example validators
validators = {
'email': lambda x: re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', x),
'length': lambda x: 5 <= len(x) <= 50
}
result = validate_complex_pattern("[email protected]", validators)
print(result) ## True
Key Takeaways
- Implement flexible matching strategies
- Precompile and optimize regex patterns
- Use comprehensive validation techniques
- Handle edge cases gracefully
By mastering these robust pattern matching techniques, developers can create more reliable and efficient text processing solutions in Python.