# Introduction In the process of solving model parameters in machine learning algorithms, Gradient Descent is one of the most commonly used methods. However, in Gradient Descent, computing the gradient requires calculating it for the entire dataset each time. If the dataset is large, it can take a long time to update the gradient. Such a method incurs a high computational cost. As an alternative, Stochastic Gradient Descent (SGD) is used, where only one sample is used to update the gradient each time. Although this method does not guarantee that the loss function moves towards the global optimum in every iteration, it generally points towards the global optimum on a larger scale. As a result, the final solution usually falls within an acceptable range of the global optima. Mini-batch Gradient Descent is a compromise between Gradient Descent and Stochastic Gradient Descent. It approximates the entire dataset by using N samples to update the parameters. It is like calculating the gradient with one sample but with the added benefit of using multiple samples to improve gradient direction estimation. Moreover, using batch samples allows for a better approximation of the distribution of the entire dataset. In this challenge, we will implement a data pipeline function that takes a given dataset and splits it into mini-batches. The function should return all the samples in the dataset within one epoch. The goal is to efficiently compute the gradient for machine learning algorithms, such as Stochastic Gradient Descent, by updating the gradient using a batch of samples at a time instead of the entire dataset.
Click the virtual machine below to start practicing