Divide Dataset Into Mini-Batches

Machine LearningMachine LearningBeginner
Practice Now

Introduction

In this project, you will learn how to implement a function to divide a dataset into mini-batches, which is a common technique used in deep learning training.

ðŸŽŊ Tasks

In this project, you will learn:

  • How to implement the data_pipeline function to divide a dataset into mini-batches
  • How to test the data_pipeline function to ensure it works as expected

🏆 Achievements

After completing this project, you will be able to:

  • Divide a dataset into mini-batches using the data_pipeline function
  • Verify the functionality of the data_pipeline function through testing

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/AdvancedTopicsGroup(["`Advanced Topics`"]) ml/BasicConceptsGroup -.-> ml/mini_batch("`Mini-batch`") python/ControlFlowGroup -.-> python/conditional_statements("`Conditional Statements`") python/ControlFlowGroup -.-> python/for_loops("`For Loops`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/AdvancedTopicsGroup -.-> python/generators("`Generators`") subgraph Lab Skills ml/mini_batch -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/conditional_statements -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/for_loops -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/lists -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} python/generators -.-> lab-300212{{"`Divide Dataset Into Mini-Batches`"}} end

Implement Mini-Batches

In this step, you will learn how to implement the data_pipeline function to divide a dataset into mini-batches.

Open the data_pipeline.py file in your text editor.

Implement the data_pipeline function according to the requirements:

  • The function should take two parameters: data (a list of lists containing integers) and batch_size (an integer representing the size of each mini-batch).
  • The function should return a generator that yields batches of the input data, where each batch contains batch_size lists of integers.
  • If the remaining amount of data is less than batch_size, the function should output all the remaining samples.

Here's the completed data_pipeline function:

from typing import Generator, List

def data_pipeline(data: List[List[int]], batch_size: int) -> Generator[List[List[int]], None, None]:
    """
    This function takes a list of lists containing integers and divides it into smaller 'batches' of a specified size.
    It returns a generator that yields these batches sequentially.

    Parameters:
    data (List[List[int]]): The input dataset, a list of lists containing integers.
    batch_size (int): The size of each batch, i.e., the number of lists of integers to include in each batch.

    Returns:
    Generator[List[List[int]], None, None]: A generator yielding batches of the input data with each batch containing 'batch_size' lists of integers.
    """
    for i in range(0, len(data), batch_size):
        batch_data = data[i : i + batch_size]
        yield batch_data

Save the data_pipeline.py file.

âœĻ Check Solution and Practice

Test the Mini Batches

In this step, you will test the data_pipeline function to ensure it works as expected.

Open the data_pipeline.py file in your text editor.

Add the following code at the end of the file to test the data_pipeline function:

if __name__ == "__main__":
    data = [[1, 2], [1, 3], [3, 5], [2, 1], [3, 3]]
    batch_size = 2
    batch_data = data_pipeline(data, batch_size)
    for batch in batch_data:
        print(f"{batch=}")

Save the data_pipeline.py file.

Run the data_pipeline.py file in your terminal:

python data_pipeline.py

The output should be:

batch=[[1, 2], [1, 3]]
batch=[[3, 5], [2, 1]]
batch=[[3, 3]]

This output confirms that the data_pipeline function is working as expected, dividing the input dataset into mini-batches of size 2.

âœĻ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Machine Learning Tutorials you may like