Text Manipulation
Introduction to Text Manipulation with Regex
Text manipulation involves transforming, replacing, splitting, and restructuring text using regular expressions.
Key Regex Manipulation Methods
re.sub(): Substitution
Replace text matching a pattern:
import re
text = "Hello, 2023 is a great year!"
result = re.sub(r'\d+', 'YEAR', text)
print(result) ## Hello, YEAR is a great year!
re.split(): Text Splitting
Split text based on regex patterns:
text = "apple,banana;orange:grape"
result = re.split(r'[,;:]', text)
print(result) ## ['apple', 'banana', 'orange', 'grape']
Complex Text Transformations
text = "date: 2023-06-15"
pattern = r'date: (\d{4})-(\d{2})-(\d{2})'
replacement = r'\3/\2/\1'
result = re.sub(pattern, replacement, text)
print(result) ## date: 15/06/2023
Text Manipulation Workflow
graph TD
A[Input Text] --> B[Regex Pattern]
B --> C{Match Found?}
C -->|Yes| D[Transform Text]
C -->|No| E[Original Text]
Common Manipulation Techniques
Technique |
Description |
Example |
Replacement |
Replace matched patterns |
re.sub() |
Splitting |
Divide text into parts |
re.split() |
Extraction |
Extract specific text segments |
re.findall() |
Advanced Text Processing
Data Cleaning
def clean_phone_number(text):
return re.sub(r'[^\d]', '', text)
phone = "+1 (555) 123-4567"
cleaned = clean_phone_number(phone)
print(cleaned) ## 15551234567
- Use compiled regex patterns
- Minimize complex transformations
- Handle large texts efficiently
LabEx Practical Applications
- Log file processing
- Data normalization
- Web scraping
- Configuration file parsing
Best Practices
- Validate input before manipulation
- Use non-capturing groups
- Test regex patterns thoroughly
Master text manipulation to unlock powerful data processing capabilities in Python with LabEx techniques.