Divide Dataset Into Mini-Batches for Deep Learning Training

Introduction

In this project, you will learn how to implement a function to divide a dataset into mini-batches, which is a common technique used in deep learning training.

🎯 Tasks

In this project, you will learn:

How to implement the data_pipeline function to divide a dataset into mini-batches
How to test the data_pipeline function to ensure it works as expected

🏆 Achievements

After completing this project, you will be able to:

Divide a dataset into mini-batches using the data_pipeline function
Verify the functionality of the data_pipeline function through testing

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("Python")) -.-> python/ControlFlowGroup(["Control Flow"]) python(("Python")) -.-> python/DataStructuresGroup(["Data Structures"]) python(("Python")) -.-> python/FunctionsGroup(["Functions"]) python(("Python")) -.-> python/FileHandlingGroup(["File Handling"]) python(("Python")) -.-> python/AdvancedTopicsGroup(["Advanced Topics"]) python/ControlFlowGroup -.-> python/for_loops("For Loops") python/DataStructuresGroup -.-> python/lists("Lists") python/FunctionsGroup -.-> python/function_definition("Function Definition") python/FileHandlingGroup -.-> python/file_operations("File Operations") python/AdvancedTopicsGroup -.-> python/generators("Generators") subgraph Lab Skills python/for_loops -.-> lab-300212{{"Divide Dataset Into Mini-Batches"}} python/lists -.-> lab-300212{{"Divide Dataset Into Mini-Batches"}} python/function_definition -.-> lab-300212{{"Divide Dataset Into Mini-Batches"}} python/file_operations -.-> lab-300212{{"Divide Dataset Into Mini-Batches"}} python/generators -.-> lab-300212{{"Divide Dataset Into Mini-Batches"}} end

Implement Mini-Batches

In this step, you will learn how to implement the data_pipeline function to divide a dataset into mini-batches.

Open the data_pipeline.py file in your text editor.

Implement the data_pipeline function according to the requirements:

The function should take two parameters: data (a list of lists containing integers) and batch_size (an integer representing the size of each mini-batch).
The function should return a generator that yields batches of the input data, where each batch contains batch_size lists of integers.
If the remaining amount of data is less than batch_size, the function should output all the remaining samples.

Here's the completed data_pipeline function:

from typing import Generator, List

def data_pipeline(data: List[List[int]], batch_size: int) -> Generator[List[List[int]], None, None]:
    """
    This function takes a list of lists containing integers and divides it into smaller 'batches' of a specified size.
    It returns a generator that yields these batches sequentially.

    Parameters:
    data (List[List[int]]): The input dataset, a list of lists containing integers.
    batch_size (int): The size of each batch, i.e., the number of lists of integers to include in each batch.

    Returns:
    Generator[List[List[int]], None, None]: A generator yielding batches of the input data with each batch containing 'batch_size' lists of integers.
    """
    for i in range(0, len(data), batch_size):
        batch_data = data[i : i + batch_size]
        yield batch_data

Save the data_pipeline.py file.

✨ Check Solution and Practice

Test the Mini Batches

In this step, you will test the data_pipeline function to ensure it works as expected.

Open the data_pipeline.py file in your text editor.

Add the following code at the end of the file to test the data_pipeline function:

if __name__ == "__main__":
    data = [[1, 2], [1, 3], [3, 5], [2, 1], [3, 3]]
    batch_size = 2
    batch_data = data_pipeline(data, batch_size)
    for batch in batch_data:
        print(f"{batch=}")

Save the data_pipeline.py file.

Run the data_pipeline.py file in your terminal:

python data_pipeline.py

The output should be:

batch=[[1, 2], [1, 3]]
batch=[[3, 5], [2, 1]]
batch=[[3, 3]]

This output confirms that the data_pipeline function is working as expected, dividing the input dataset into mini-batches of size 2.

✨ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.