Divide Dataset Into Mini-Batches

Machine LearningMachine LearningBeginner
Practice Now

Introduction

In this project, you will learn how to implement a function to divide a dataset into mini-batches, which is a common technique used in deep learning training.

🎯 Tasks

In this project, you will learn:

  • How to implement the data_pipeline function to divide a dataset into mini-batches
  • How to test the data_pipeline function to ensure it works as expected

🏆 Achievements

After completing this project, you will be able to:

  • Divide a dataset into mini-batches using the data_pipeline function
  • Verify the functionality of the data_pipeline function through testing

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/AdvancedTopicsGroup(["`Advanced Topics`"]) ml/BasicConceptsGroup -.-> ml/mini_batch("`Mini-batch`") python/ControlFlowGroup -.-> python/conditional_statements("`Conditional Statements`") python/ControlFlowGroup -.-> python/for_loops("`For Loops`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/AdvancedTopicsGroup -.-> python/generators("`Generators`") subgraph Lab Skills ml/mini_batch -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/conditional_statements -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/for_loops -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/lists -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/generators -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} end

Implement Mini-Batches

In this step, you will learn how to implement the data_pipeline function to divide a dataset into mini-batches.

Open the data_pipeline.py file in your text editor.

Implement the data_pipeline function according to the requirements:

  • The function should take two parameters: data (a list of lists containing integers) and batch_size (an integer representing the size of each mini-batch).
  • The function should return a generator that yields batches of the input data, where each batch contains batch_size lists of integers.
  • If the remaining amount of data is less than batch_size, the function should output all the remaining samples.

Here's the completed data_pipeline function:

from typing import Generator, List

def data_pipeline(data: List[List[int]], batch_size: int) -> Generator[List[List[int]], None, None]:
    """
    This function takes a list of lists containing integers and divides it into smaller 'batches' of a specified size.
    It returns a generator that yields these batches sequentially.

    Parameters:
    data (List[List[int]]): The input dataset, a list of lists containing integers.
    batch_size (int): The size of each batch, i.e., the number of lists of integers to include in each batch.

    Returns:
    Generator[List[List[int]], None, None]: A generator yielding batches of the input data with each batch containing 'batch_size' lists of integers.
    """
    for i in range(0, len(data), batch_size):
        batch_data = data[i : i + batch_size]
        yield batch_data

Save the data_pipeline.py file.

Test the Mini Batches

In this step, you will test the data_pipeline function to ensure it works as expected.

Open the data_pipeline.py file in your text editor.

Add the following code at the end of the file to test the data_pipeline function:

if __name__ == "__main__":
    data = [[1, 2], [1, 3], [3, 5], [2, 1], [3, 3]]
    batch_size = 2
    batch_data = data_pipeline(data, batch_size)
    for batch in batch_data:
        print(f"{batch=}")

Save the data_pipeline.py file.

Run the data_pipeline.py file in your terminal:

python data_pipeline.py

The output should be:

batch=[[1, 2], [1, 3]]
batch=[[3, 5], [2, 1]]
batch=[[3, 3]]

This output confirms that the data_pipeline function is working as expected, dividing the input dataset into mini-batches of size 2.

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Machine Learning Tutorials you may like