Optimizing Gradient Descent for Global Optimization (Challenge)

# Introduction Gradient descent is an iterative method that can be used to find the minimum value of a loss function. By using the gradient descent algorithm, we can iteratively solve the loss function and obtain the minimized loss function and model parameter values. The update strategy in gradient descent is to update the current weight $w_{t+1}$ by multiplying the current gradient $\frac{\partial f}{\partial w_t}$ with the learning rate $\alpha$, according to the following formula: $$w_{t+1}=w_t - \alpha \frac{\partial f}{\partial w_t}$$ In the start of the gradient descent algorithm, we need to initialize a starting point $w_0$ and update the parameters accordingly. The following process demonstrates finding the minimum value of the function $f(w)=w^2$. The starting point $w_0=-10$ and the learning rate $\alpha=1$. <video width="100%" src="./assets/gd1.mp4" autoplay loop muted></video> In this challenge, we will be exploring the concept of gradient descent and its shortcomings. Gradient descent is an iterative method used to find the minimum value of a loss function. However, it can sometimes get trapped in local optimal points and fail to find the global optimal point. The goal of this lab is to optimize the gradient descent method so that it can skip local optimal points and find the global optimal point efficiently.

Click the virtual machine below to start practicing