Practical Applications
Real-World Regex Use Cases
Data Validation
import re
def validate_input(input_type, value):
validators = {
'email': r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$',
'phone': r'^\+?1?\d{10,14}$',
'url': r'^https?://(?:www\.)?[a-zA-Z0-9-]+\.[a-zA-Z]{2,}(?:/\S*)?$'
}
return re.match(validators[input_type], value) is not None
## LabEx input validation examples
print(validate_input('email', 'user@labex.io'))
print(validate_input('phone', '+1234567890'))
print(validate_input('url', 'https://labex.io'))
Log Parsing and Analysis
def parse_log_file(log_path):
error_pattern = r'(\d{4}-\d{2}-\d{2}) .*\[ERROR\] (.+)'
errors = []
with open(log_path, 'r') as file:
for line in file:
match = re.search(error_pattern, line)
if match:
errors.append({
'date': match.group(1),
'message': match.group(2)
})
return errors
## Example log parsing in LabEx environment
log_errors = parse_log_file('/var/log/application.log')
Text Transformation
graph LR
A[Text Transformation] --> B[Cleaning]
A --> C[Formatting]
A --> D[Extraction]
A --> E[Replacement]
Text Processing Techniques
def process_text(text):
## Remove extra whitespaces
text = re.sub(r'\s+', ' ', text)
## Standardize phone numbers
text = re.sub(r'(\d{3})[-.]?(\d{3})[-.]?(\d{4})',
r'(\1) \2-\3', text)
## Mask sensitive information
text = re.sub(r'\b\d{4}-\d{4}-\d{4}-\d{4}\b',
'****-****-****-****', text)
return text
sample_text = "Contact: John Doe 1234-5678-9012-3456 at 123.456.7890"
print(process_text(sample_text))
Web Scraping Preprocessing
def clean_html_content(html_text):
## Remove HTML tags
clean_text = re.sub(r'<[^>]+>', '', html_text)
## Decode HTML entities
clean_text = re.sub(r'&[a-z]+;', ' ', clean_text)
## Normalize whitespace
clean_text = re.sub(r'\s+', ' ', clean_text).strip()
return clean_text
| Optimization Technique |
Description |
Example |
| Compile Patterns |
Precompile regex for repeated use |
pattern = re.compile(r'\d+') |
| Use Specific Patterns |
Avoid overly generic patterns |
\d+ instead of .* |
| Minimize Backtracking |
Use non-greedy quantifiers |
.*? instead of .* |
def extract_structured_data(text):
## Extract key-value pairs
pattern = r'(\w+)\s*:\s*([^\n]+)'
return dict(re.findall(pattern, text))
sample_data = """
Name: John Doe
Age: 30
Email: john@labex.io
Role: Developer
"""
structured_data = extract_structured_data(sample_data)
print(structured_data)
Security Considerations
- Always sanitize and validate user inputs
- Be cautious with regex complexity
- Implement timeout mechanisms for regex operations
Key Takeaways
- Regex is versatile across multiple domains
- Careful pattern design is crucial
- LabEx recommends incremental testing and optimization
By mastering these practical applications, you'll leverage regex as a powerful tool for text processing, validation, and transformation in various Python projects.