目录
* Outline <https://www.cnblogs.com/nickchen121/p/10901468.html#outline>
* What's Gradient
<https://www.cnblogs.com/nickchen121/p/10901468.html#whats-gradient>
* What does it mean?
<https://www.cnblogs.com/nickchen121/p/10901468.html#what-does-it-mean>
* How to search
<https://www.cnblogs.com/nickchen121/p/10901468.html#how-to-search>
* For instance
<https://www.cnblogs.com/nickchen121/p/10901468.html#for-instance>
* AutoGrad <https://www.cnblogs.com/nickchen121/p/10901468.html#autograd>
* \(2^{nd}\)-order
<https://www.cnblogs.com/nickchen121/p/10901468.html#nd-order>
Outline
*
What's Gradient
*
What does it mean
*
How to Search
*
AutoGrad
What's Gradient
*
导数,derivative,抽象表达
*
偏微分,partial derivative,沿着某个具体的轴运动
*
梯度,gradient,向量
\[ \nabla{f} =
(\frac{\partial{f}}{\partial{x_1}};\frac{\partial{f}}{{\partial{x_2}}};\cdots;\frac{\partial{f}}{{\partial{x_n}}})
\]
What does it mean?
* 箭头的方向表示梯度的方向
* 箭头模的大小表示梯度增大的速率
How to search
* 沿着梯度下降的反方向搜索
For instance
\[ \theta_{t+1}=\theta_t-\alpha_t\nabla{f(\theta_t)} \]
AutoGrad
* With Tf.GradientTape() as tape:
* Build computation graph
* \(loss = f_\theta{(x)}\)
* [w_grad] = tape.gradient(loss,[w]) import tensorflow as tf w =
tf.constant(1.) x = tf.constant(2.) y = x * w with tf.GradientTape() as tape:
tape.watch([w]) y2 = x * w grad1 = tape.gradient(y, [w]) grad1 [None] with
tf.GradientTape() as tape: tape.watch([w]) y2 = x * w grad2 = tape.gradient(y2,
[w]) grad2 [<tf.Tensor: id=30, shape=(), dtype=float32, numpy=2.0>] try: grad2
= tape.gradient(y2, [w]) except Exception as e: print(e) GradientTape.gradient
can only be called once on non-persistent tapes.
* 永久保存grad with tf.GradientTape(persistent=True) as tape: tape.watch([w]) y2
= x * w grad2 = tape.gradient(y2, [w]) grad2 [<tf.Tensor: id=35, shape=(),
dtype=float32, numpy=2.0>] grad2 = tape.gradient(y2, [w]) grad2 [<tf.Tensor:
id=39, shape=(), dtype=float32, numpy=2.0>]
\(2^{nd}\)-order
*
y = xw + b
* \(\frac{\partial{y}}{\partial{w}} = x\)
*
\(\frac{\partial^2{y}}{\partial{w^2}} = \frac{\partial{y'}}{\partial{w}} =
\frac{\partial{X}}{\partial{w}} = None\)
with tf.GradientTape() as t1: with tf.GradientTape() as t2: y = x * w + b
dy_dw, dy_db = t2.gradient(y, [w, b]) d2y_dw2 = t1.gradient(dy_dw, w)
热门工具 换一换