Gradient descent method is used in many machine learning algorithms, It's a tool for optimization. But gradient descent is essentially a greedy algorithm, Easy to fall into local. References in this paper[1] OfMomentum Method help gradient descent method jump out of local optimal solution.
For various optimization methods, see[2] Detailed explanation.
