Practical Applications
Data Processing Techniques
## Cleaning and transforming raw data
raw_data = [' apple ', ' banana ', 'cherry ', ' date']
cleaned_data = list(map(str.strip, raw_data))
## Result: ['apple', 'banana', 'cherry', 'date']
Scientific Computing
import numpy as np
def normalize(values):
return (values - np.min(values)) / (np.max(values) - np.min(values))
data = [10, 20, 30, 40, 50]
normalized = normalize(np.array(data))
## Scales data to 0-1 range
graph TD
A[Raw Data] --> B[Clean]
B --> C[Transform]
C --> D[Analyze]
D --> E[Visualize]
Machine Learning Preprocessing
Stage |
Transformation |
Purpose |
Cleaning |
Remove duplicates |
Data quality |
Encoding |
Convert categorical data |
Numerical representation |
Normalization |
Scale features |
Model performance |
Feature Engineering
def extract_features(text):
return {
'length': len(text),
'word_count': len(text.split())
}
texts = ['hello world', 'python programming']
features = list(map(extract_features, texts))
Web Data Processing
import json
def process_user_data(user):
return {
'name': user['name'].upper(),
'active': user['status'] == 'active'
}
users = [
{'name': 'john', 'status': 'active'},
{'name': 'jane', 'status': 'inactive'}
]
processed_users = list(map(process_user_data, users))
Advanced Application Patterns
Functional Error Handling
def safe_divide(a, b):
try:
return a / b
except ZeroDivisionError:
return None
numbers = [10, 20, 0, 40, 50]
results = list(map(lambda x: safe_divide(100, x), numbers))
Parallel Processing
graph LR
A[Input Data] --> B[Split]
B --> C[Parallel Transformation]
C --> D[Aggregate Results]
from concurrent.futures import ProcessPoolExecutor
def heavy_computation(x):
return x ** 2
with ProcessPoolExecutor() as executor:
data = [1, 2, 3, 4, 5]
results = list(executor.map(heavy_computation, data))
- Use generator expressions
- Leverage built-in functions
- Consider lazy evaluation
- Profile transformation code
LabEx Insights
LabEx recommends practicing these transformation techniques across various domains to develop robust data manipulation skills. Experiment with different approaches to find the most efficient solution for your specific use case.