Optimization Methods
Median calculation can become computationally expensive for large datasets, especially when sorting is required. This section explores various optimization techniques to improve calculation speed.
Sorting-Based Optimization Strategies
Quick Select Algorithm
def quick_select_median(arr):
def partition(left, right, pivot_index):
pivot = arr[pivot_index]
## Swap pivot with last element
arr[pivot_index], arr[right] = arr[right], arr[pivot_index]
store_index = left
for i in range(left, right):
if arr[i] < pivot:
arr[store_index], arr[i] = arr[i], arr[store_index]
store_index += 1
arr[right], arr[store_index] = arr[store_index], arr[right]
return store_index
def select(left, right, k):
if left == right:
return arr[left]
pivot_index = (left + right) // 2
pivot_index = partition(left, right, pivot_index)
if k == pivot_index:
return arr[k]
elif k < pivot_index:
return select(left, pivot_index - 1, k)
else:
return select(pivot_index + 1, right, k)
n = len(arr)
return select(0, n - 1, n // 2)
Optimization Comparison
Method |
Time Complexity |
Space Complexity |
Pros |
Cons |
Sorting |
O(n log n) |
O(n) |
Simple |
Not efficient for large datasets |
Quick Select |
O(n) average |
O(1) |
Efficient |
Complex implementation |
Heap-based |
O(n log k) |
O(k) |
Good for streaming |
Requires additional space |
Memory-Efficient Approaches
import heapq
class MedianFinder:
def __init__(self):
self.small = [] ## max heap
self.large = [] ## min heap
def addNum(self, num):
## Always add to small heap first
heapq.heappush(self.small, -num)
## Ensure balance between heaps
if self.small and self.large and -self.small[0] > self.large[0]:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
## Balance heap sizes
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small) + 1:
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self):
if len(self.small) == len(self.large):
return (-self.small[0] + self.large[0]) / 2.0
return -self.small[0] if len(self.small) > len(self.large) else self.large[0]
Optimization Flow
graph TD
A[Input Data] --> B{Dataset Size}
B -->|Small| C[Simple Sorting]
B -->|Large| D[Quick Select]
B -->|Streaming| E[Heap-based Method]
Practical Considerations
- Choose optimization method based on:
- Dataset size
- Memory constraints
- Computational resources
import timeit
def benchmark_median_methods():
data = list(range(10000))
## Benchmark different methods
sorting_time = timeit.timeit(lambda: sorted_median(data), number=100)
quick_select_time = timeit.timeit(lambda: quick_select_median(data), number=100)
print(f"Sorting Method: {sorting_time}")
print(f"Quick Select: {quick_select_time}")
LabEx recommends experimenting with different optimization techniques to find the most suitable approach for your specific use case.