Gradient descent method is used in many machine learning algorithms, It's a tool for optimization.. But gradient descent is essentially a greedy algorithm, Easy to fall into local. References in this paper[1] OfMomentum Method help gradient descent method jump out of local optimal solution.
For various optimization methods, see[2] Detailed explanation.
———————- I'm the elegant divider, Wash and sleep first—————–

Ref:
1,http://ruder.io/optimizing-gradient-descent/index.html#momentum
<http://ruder.io/optimizing-gradient-descent/index.html#momentum>
2,https://zhuanlan.zhihu.com/p/22252270
<https://zhuanlan.zhihu.com/p/22252270>