Gradient descent is used in many machine learning algorithms , It's an optimization tool . But gradient descent is essentially a greedy algorithm , Easy to fall into the local most . References in this paper [1] Of Momentum Methods help the gradient descent method to jump out of the local optimal solution .
For various optimization methods, see [2] Detailed explanation .
———————- I am the elegant dividing line , Wash and sleep first —————–

Ref:
1,http://ruder.io/optimizing-gradient-descent/index.html#momentum
<http://ruder.io/optimizing-gradient-descent/index.html#momentum>
2,https://zhuanlan.zhihu.com/p/22252270
<https://zhuanlan.zhihu.com/p/22252270>