Practical Regex Examples
Real-World Symbol Removal Scenarios
1. Email Cleaning
import re
def clean_email(email):
## Remove invalid characters from email
return re.sub(r'[^\w.@-]', '', email)
emails = [
"[email protected]",
"invalid!email#test",
"[email protected]"
]
cleaned_emails = [clean_email(email) for email in emails]
print(cleaned_emails)
Common Removal Patterns
Symbol Removal Strategies
Scenario |
Regex Pattern |
Purpose |
Remove Punctuation |
[^\w\s] |
Clean text |
Strip Special Chars |
\W+ |
Alphanumeric only |
Remove Digits |
\d |
Text-only processing |
Advanced Text Processing
Complex Cleaning Example
def advanced_text_cleaner(text):
## Multi-stage text cleaning
stages = [
(r'[^\w\s]', ''), ## Remove punctuation
(r'\s+', ' '), ## Normalize whitespace
(r'^\s+|\s+$', '') ## Trim edges
]
for pattern, replacement in stages:
text = re.sub(pattern, replacement, text)
return text.lower()
## Example usage
sample_text = " LabEx: Python Programming! 2023 "
cleaned_text = advanced_text_cleaner(sample_text)
print(cleaned_text)
Regex Processing Workflow
graph TD
A[Input Text] --> B{Regex Patterns}
B --> |Remove Symbols| C[Cleaned Intermediate Text]
B --> |Normalize Spacing| D[Refined Text]
C --> E[Final Processed Text]
D --> E
Compiled Regex Patterns
import re
class TextCleaner:
def __init__(self):
## Precompile regex patterns
self.symbol_pattern = re.compile(r'[^\w\s]')
self.space_pattern = re.compile(r'\s+')
def clean(self, text):
## Use compiled patterns for efficiency
text = self.symbol_pattern.sub('', text)
text = self.space_pattern.sub(' ', text)
return text.strip()
## Usage
cleaner = TextCleaner()
result = cleaner.clean("LabEx: Python Programming! 2023")
print(result)
Specialized Removal Contexts
Domain-Specific Cleaning
- Web Scraping: Remove HTML tags
- Log Processing: Strip timestamps
- Data Normalization: Standardize input formats
def web_text_cleaner(html_text):
## Remove HTML tags and extra symbols
cleaned = re.sub(r'<[^>]+>', '', html_text)
cleaned = re.sub(r'[^\w\s]', '', cleaned)
return cleaned.strip()
sample_html = "<p>LabEx: Python Tutorial!</p>"
print(web_text_cleaner(sample_html))
Best Practices
- Use raw strings for regex patterns
- Compile frequently used patterns
- Test regex thoroughly
- Consider performance for large datasets