Removing Duplicate Techniques
Overview of Duplicate Removal Methods
graph TD
A[Duplicate Removal Techniques] --> B[Using set()]
A --> C[Using list comprehension]
A --> D[Using dict.fromkeys()]
A --> E[Using pandas]
1. Using set() Method
The simplest and most straightforward approach:
## Basic set() usage
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(set(original_list))
print(unique_list) ## Output: [1, 2, 3, 4, 5]
2. List Comprehension Technique
Preserves order and provides more control:
## List comprehension with tracking
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = []
[unique_list.append(x) for x in original_list if x not in unique_list]
print(unique_list) ## Output: [1, 2, 3, 4, 5]
3. dict.fromkeys() Method
Efficient for maintaining unique elements:
## Using dict.fromkeys()
original_list = [1, 2, 2, 3, 4, 4, 5]
unique_list = list(dict.fromkeys(original_list))
print(unique_list) ## Output: [1, 2, 3, 4, 5]
Comparison of Techniques
Method |
Time Complexity |
Order Preservation |
Memory Efficiency |
set() |
O(n) |
No |
High |
List Comprehension |
O(nÂē) |
Yes |
Moderate |
dict.fromkeys() |
O(n) |
Yes |
High |
Advanced Techniques for Complex Scenarios
Handling Nested Lists
## Removing duplicates from nested lists
complex_list = [[1, 2], [2, 3], [1, 2], [4, 5]]
unique_complex = list(map(list, set(map(tuple, complex_list))))
print(unique_complex) ## Output: [[1, 2], [2, 3], [4, 5]]
Using Pandas for Large Datasets
import pandas as pd
## Pandas duplicate removal
df = pd.DataFrame({'values': [1, 2, 2, 3, 4, 4, 5]})
unique_df = df.drop_duplicates()
print(unique_df['values'].tolist()) ## Output: [1, 2, 3, 4, 5]
LabEx recommends choosing the right technique based on:
- Dataset size
- Memory constraints
- Order preservation requirements