Introduction
In this project, you will learn how to implement a function to divide a dataset into mini-batches, which is a common technique used in deep learning training.
🎯 Tasks
In this project, you will learn:
- How to implement the
data_pipelinefunction to divide a dataset into mini-batches - How to test the
data_pipelinefunction to ensure it works as expected
🏆 Achievements
After completing this project, you will be able to:
- Divide a dataset into mini-batches using the
data_pipelinefunction - Verify the functionality of the
data_pipelinefunction through testing
Implement Mini-Batches
In this step, you will learn how to implement the data_pipeline function to divide a dataset into mini-batches.
Open the data_pipeline.py file in your text editor.
Implement the data_pipeline function according to the requirements:
- The function should take two parameters:
data(a list of lists containing integers) andbatch_size(an integer representing the size of each mini-batch). - The function should return a generator that yields batches of the input data, where each batch contains
batch_sizelists of integers. - If the remaining amount of data is less than
batch_size, the function should output all the remaining samples.
Here's the completed data_pipeline function:
from typing import Generator, List
def data_pipeline(data: List[List[int]], batch_size: int) -> Generator[List[List[int]], None, None]:
"""
This function takes a list of lists containing integers and divides it into smaller 'batches' of a specified size.
It returns a generator that yields these batches sequentially.
Parameters:
data (List[List[int]]): The input dataset, a list of lists containing integers.
batch_size (int): The size of each batch, i.e., the number of lists of integers to include in each batch.
Returns:
Generator[List[List[int]], None, None]: A generator yielding batches of the input data with each batch containing 'batch_size' lists of integers.
"""
for i in range(0, len(data), batch_size):
batch_data = data[i : i + batch_size]
yield batch_data
Save the data_pipeline.py file.
Test the Mini Batches
In this step, you will test the data_pipeline function to ensure it works as expected.
Open the data_pipeline.py file in your text editor.
Add the following code at the end of the file to test the data_pipeline function:
if __name__ == "__main__":
data = [[1, 2], [1, 3], [3, 5], [2, 1], [3, 3]]
batch_size = 2
batch_data = data_pipeline(data, batch_size)
for batch in batch_data:
print(f"{batch=}")
Save the data_pipeline.py file.
Run the data_pipeline.py file in your terminal:
python data_pipeline.py
The output should be:
batch=[[1, 2], [1, 3]]
batch=[[3, 5], [2, 1]]
batch=[[3, 3]]
This output confirms that the data_pipeline function is working as expected, dividing the input dataset into mini-batches of size 2.
Summary
Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.



