Data Cleaning and Preprocessing
Removing Duplicates
## Remove duplicates from a list
raw_data = [1, 2, 2, 3, 3, 4, 5, 5]
cleaned_data = list(set(raw_data))
Filtering Collections
## Filter list based on conditions
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
List Comprehensions
## Transform list elements
original_list = [1, 2, 3, 4, 5]
squared_list = [x**2 for x in original_list]
## Convert dictionary keys and values
student_scores = {'Alice': 85, 'Bob': 92, 'Charlie': 78}
uppercase_scores = {name.upper(): score for name, score in student_scores.items()}
## Flatten nested lists
nested_list = [[1, 2], [3, 4], [5, 6]]
flat_list = [item for sublist in nested_list for item in sublist]
graph TD
A[Original Collection] --> B{Transformation Method}
B --> |Filtering| C[Filtered Collection]
B --> |Mapping| D[Mapped Collection]
B --> |Reduction| E[Reduced Collection]
Transformation Method |
Time Complexity |
Memory Efficiency |
List Comprehension |
O(n) |
Moderate |
map() Function |
O(n) |
Low |
filter() Function |
O(n) |
Moderate |
Generator Expressions |
O(n) |
High |
## Combining multiple transformations
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
transformed_data = (
list(filter(lambda x: x % 2 == 0, data)) ## Filter even numbers
|> map(lambda x: x**2) ## Square the numbers
|> list ## Convert to list
)
Practical Use Cases
## Transform data for analysis
sales_data = [
{'product': 'laptop', 'price': 1000},
{'product': 'phone', 'price': 500},
{'product': 'tablet', 'price': 300}
]
total_value = sum(item['price'] for item in sales_data)
At LabEx, we recommend mastering these transformation techniques to write more efficient and expressive Python code.