Real-World Applications of Iterators
Now that you understand how to create and use iterators, let's explore some practical real-world applications where iterators can improve your code.
Lazy Evaluation with Generators
Generators are a special type of iterator created with functions that use the yield statement. They allow you to generate values on-the-fly, which can be more memory-efficient than creating a complete list.
Create a file named generator_example.py:
def squared_numbers(n):
"""Generate squares of numbers from 1 to n."""
for i in range(1, n + 1):
yield i * i
## Create a generator for squares of numbers 1 to 10
squares = squared_numbers(10)
## squares is a generator object (a type of iterator)
print(f"Type of squares: {type(squares)}")
## Use next() to get values from the generator
print("\nGetting values with next():")
print(next(squares)) ## 1
print(next(squares)) ## 4
print(next(squares)) ## 9
## Use a for loop to get the remaining values
print("\nGetting remaining values with a for loop:")
for square in squares:
print(square)
## The generator is now exhausted, so this won't print anything
print("\nTrying to get more values (generator is exhausted):")
for square in squares:
print(square)
## Create a new generator and convert all values to a list at once
all_squares = list(squared_numbers(10))
print(f"\nAll squares as a list: {all_squares}")
Run the code:
python3 ~/project/generator_example.py
You'll see:
Type of squares: <class 'generator'>
Getting values with next():
1
4
9
Getting remaining values with a for loop:
16
25
36
49
64
81
100
Trying to get more values (generator is exhausted):
All squares as a list: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Generators are a more concise way to create iterators for simple cases. They automatically implement the iterator protocol for you.
Processing Large Data Sets
Iterators are perfect for processing large data sets because they allow you to work with one element at a time. Let's create an example that simulates processing a large dataset of temperatures:
Create a file named data_processing.py:
import random
import time
def temperature_data_generator(days, start_temp=15.0, max_variation=5.0):
"""Generate simulated hourly temperature data for a number of days."""
hours_per_day = 24
total_hours = days * hours_per_day
current_temp = start_temp
for hour in range(total_hours):
## Simulate temperature variations
day_progress = (hour % hours_per_day) / hours_per_day ## 0.0 to 1.0 through the day
## Temperature is generally cooler at night, warmer during day
time_factor = -max_variation/2 * (
-2 * day_progress + 1 if day_progress < 0.5
else 2 * day_progress - 1
)
## Add some randomness
random_factor = random.uniform(-1.0, 1.0)
current_temp += time_factor + random_factor
current_temp = max(0, min(40, current_temp)) ## Keep between 0-40°C
yield (hour // hours_per_day, hour % hours_per_day, round(current_temp, 1))
def process_temperature_data():
"""Process a large set of temperature data using an iterator."""
print("Processing hourly temperature data for 30 days...")
print("-" * 50)
## Create our data generator
data_iterator = temperature_data_generator(days=30)
## Track some statistics
total_readings = 0
temp_sum = 0
min_temp = float('inf')
max_temp = float('-inf')
## Process the data one reading at a time
start_time = time.time()
for day, hour, temp in data_iterator:
## Update statistics
total_readings += 1
temp_sum += temp
min_temp = min(min_temp, temp)
max_temp = max(max_temp, temp)
## Just for demonstration, print a reading every 24 hours
if hour == 12: ## Noon each day
print(f"Day {day+1}, 12:00 PM: {temp}°C")
processing_time = time.time() - start_time
## Calculate final statistics
avg_temp = temp_sum / total_readings if total_readings > 0 else 0
print("-" * 50)
print(f"Processed {total_readings} temperature readings in {processing_time:.3f} seconds")
print(f"Average temperature: {avg_temp:.1f}°C")
print(f"Temperature range: {min_temp:.1f}°C to {max_temp:.1f}°C")
## Run the temperature data processing
process_temperature_data()
Run the code:
python3 ~/project/data_processing.py
You'll see output similar to (the exact temperatures will vary due to randomness):
Processing hourly temperature data for 30 days...
--------------------------------------------------
Day 1, 12:00 PM: 17.5°C
Day 2, 12:00 PM: 18.1°C
Day 3, 12:00 PM: 17.3°C
...
Day 30, 12:00 PM: 19.7°C
--------------------------------------------------
Processed 720 temperature readings in 0.012 seconds
Average temperature: 18.2°C
Temperature range: 12.3°C to 24.7°C
In this example, we're using an iterator to process a simulated dataset of 720 temperature readings (24 hours × 30 days) without having to store all the data in memory at once. The iterator generates each reading on demand, making the code more memory-efficient.
Building a Data Pipeline with Iterators
Iterators can be chained together to create data processing pipelines. Let's build a simple pipeline that:
- Generates numbers
- Filters out odd numbers
- Squares the remaining even numbers
- Limits the output to a specific number of results
Create a file named data_pipeline.py:
def generate_numbers(start, end):
"""Generate numbers in the given range."""
print(f"Starting generator from {start} to {end}")
for i in range(start, end + 1):
print(f"Generating: {i}")
yield i
def filter_even(numbers):
"""Filter for even numbers only."""
for num in numbers:
if num % 2 == 0:
print(f"Filtering: {num} is even")
yield num
else:
print(f"Filtering: {num} is odd (skipped)")
def square_numbers(numbers):
"""Square each number."""
for num in numbers:
squared = num ** 2
print(f"Squaring: {num} → {squared}")
yield squared
def limit_results(iterable, max_results):
"""Limit the number of results."""
count = 0
for item in iterable:
if count < max_results:
print(f"Limiting: keeping item #{count+1}")
yield item
count += 1
else:
print(f"Limiting: reached maximum of {max_results} items")
break
## Create our data pipeline
print("Creating data pipeline...\n")
pipeline = (
limit_results(
square_numbers(
filter_even(
generate_numbers(1, 10)
)
),
3 ## Limit to 3 results
)
)
## Execute the pipeline by iterating through it
print("\nExecuting pipeline and collecting results:")
print("-" * 50)
results = list(pipeline)
print("-" * 50)
print(f"\nFinal results: {results}")
Run the code:
python3 ~/project/data_pipeline.py
You'll see:
Creating data pipeline...
Executing pipeline and collecting results:
--------------------------------------------------
Starting generator from 1 to 10
Generating: 1
Filtering: 1 is odd (skipped)
Generating: 2
Filtering: 2 is even
Squaring: 2 → 4
Limiting: keeping item #1
Generating: 3
Filtering: 3 is odd (skipped)
Generating: 4
Filtering: 4 is even
Squaring: 4 → 16
Limiting: keeping item #2
Generating: 5
Filtering: 5 is odd (skipped)
Generating: 6
Filtering: 6 is even
Squaring: 6 → 36
Limiting: keeping item #3
Generating: 7
Filtering: 7 is odd (skipped)
Generating: 8
Filtering: 8 is even
Squaring: 8 → 64
Limiting: reached maximum of 3 items
--------------------------------------------------
Final results: [4, 16, 36]
This pipeline example shows how iterators can be connected together to form a data processing workflow. Each stage of the pipeline processes one item at a time, passing it to the next stage. The pipeline doesn't process any data until we actually consume the results (in this case, by converting it to a list).
The key advantage is that no intermediate lists are created between the pipeline stages, making this approach memory-efficient even for large datasets.