Data Processing Scenarios
graph TD
A[Raw Data] --> B[Transformation]
B --> C[Processed Data]
C --> D[Analysis/Visualization]
1. Text Data Transformation
String Case Conversion
## Converting text case
names = ['alice', 'bob', 'charlie']
capitalized_names = list(map(str.title, names))
print(capitalized_names) ## ['Alice', 'Bob', 'Charlie']
Text Cleaning
## Removing whitespace
texts = [' hello ', ' world ', ' python ']
cleaned_texts = list(map(str.strip, texts))
print(cleaned_texts) ## ['hello', 'world', 'python']
2. Numeric Data Manipulation
## Complex numeric operations
numbers = [1, 2, 3, 4, 5]
transformed = list(map(lambda x: x**2 + 10, numbers))
print(transformed) ## [11, 14, 19, 26, 35]
Statistical Calculations
def normalize(value, min_val, max_val):
return (value - min_val) / (max_val - min_val)
data = [10, 20, 30, 40, 50]
normalized = list(map(lambda x: normalize(x, min(data), max(data)), data))
print(normalized) ## [0.0, 0.25, 0.5, 0.75, 1.0]
Dictionary Manipulation
## Transforming dictionary values
users = [
{'name': 'alice', 'age': 30},
{'name': 'bob', 'age': 25},
{'name': 'charlie', 'age': 35}
]
## Extract and transform specific fields
names = list(map(lambda user: user['name'].upper(), users))
print(names) ## ['ALICE', 'BOB', 'CHARLIE']
Technique |
Use Case |
Performance |
Complexity |
map() |
Simple transformations |
High |
Low |
List Comprehension |
Readable, flexible |
High |
Moderate |
Generator Expressions |
Memory efficient |
Moderate |
High |
4. Real-world Data Processing
## Complex data processing
transactions = [
{'amount': 100, 'type': 'purchase'},
{'amount': -50, 'type': 'refund'},
{'amount': 200, 'type': 'purchase'}
]
## Filter and transform purchases
purchase_totals = list(
map(lambda t: t['amount'],
filter(lambda t: t['type'] == 'purchase', transactions))
)
print(purchase_totals) ## [100, 200]
Lazy Evaluation with Generators
## Memory-efficient large dataset processing
def process_large_dataset(data):
return (x**2 for x in data if x % 2 == 0)
## Works with minimal memory consumption
large_data = range(1_000_000)
processed = process_large_dataset(large_data)
LabEx Pro Tip
In LabEx Python tutorials, always consider:
- Readability of transformation code
- Memory efficiency
- Performance requirements
Choose the right transformation technique based on your specific data processing needs.