f′(a)=limh→0f(a+h)−f(a)h

* 一阶导数，即梯度（gradient）：

∇f(X)=∂f(X)∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂f(X)∂x1∂f(X)∂x2⋮∂f(X)∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

* 二阶导数，Hessian矩阵：
H(x)=∇2f(X)=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂2f(X)∂x12∂2f(X)∂x2∂x1⋮∂2f(X)∂xn∂x1∂2f(X)∂x1∂x2∂2f(X
)∂x22⋮∂2f(X)∂xn∂x2⋯⋯⋱⋯∂2f(X)∂x1∂xn∂2f(X)∂x2∂xn⋮∂2f(X)∂xn2⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

f(xk+δ)≈f(xk)+f′(xk)δ+12f′′(xk)δ2+⋯+1n!f(n)(xk)δn

f(xk+δ)≈f(xk)+∇Tf(xk)δ+12δTf′′(xk)δ

δ，此时∇Tf(xk)δ=||∇Tf(xk)||⋅||δ||cos(θ)，最大在cos(θ)=1取到，即δ

（1）对标量求导

* 标量关于标量x的求导：
∂y∂x
* 向量关于标量x的求导：

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂yn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
* 矩阵·关于标量x的求导：

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂yn1∂x∂y12∂x∂y22∂x⋮∂yn2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ynn∂x
⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
（2）对向量求导

* 标量关于向量x的导数

=[∂y∂x1 ∂y∂x2 ⋯ ∂y∂xn]
* 向量关于向量 x 的导数

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂yn∂x1∂y1∂x2∂y2∂x2⋮∂yn∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂yn∂x
n⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

*

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x1∂y21∂x1⋮∂yn1∂x1∂y1n∂x2∂y22∂x2⋮∂yn2∂x2⋯⋯⋱⋯∂y1n∂xn∂y2n∂
xn⋮∂ynn∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
（3）对矩阵求导

∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1n∂y∂x21∂y∂x22⋮∂y∂x2n⋯⋯⋱⋯∂y∂xn1∂y∂xn2⋮∂y∂xnn
⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

[1] http://blog.csdn.net/u010976453/article/details/54342895
<http://blog.csdn.net/u010976453/article/details/54342895>
[2] http://blog.csdn.net/u010976453/article/details/78482502
<http://blog.csdn.net/u010976453/article/details/78482502>
[3] http://blog.csdn.net/u010976453/article/details/54381248
<http://blog.csdn.net/u010976453/article/details/54381248>
[4] Jacobian矩阵和Hessian矩阵
http://jacoxu.com/jacobian%e7%9f%a9%e9%98%b5%e5%92%8chessian%e7%9f%a9%e9%98%b5/
<http://jacoxu.com/jacobian%e7%9f%a9%e9%98%b5%e5%92%8chessian%e7%9f%a9%e9%98%b5/>
[5] https://en.wikipedia.org/wiki/Norm_(mathematics)
<https://en.wikipedia.org/wiki/Norm_(mathematics)>
[6] https://en.wikipedia.org/wiki/Matrix_norm
<https://en.wikipedia.org/wiki/Matrix_norm>
[7] 机器学习中的线性代数之矩阵求导 http://blog.csdn.net/u010976453/article/details/54381248
<http://blog.csdn.net/u010976453/article/details/54381248>
[8] 牛顿法与Hessian矩阵http://blog.csdn.net/linolzhang/article/details/60151623
<http://blog.csdn.net/linolzhang/article/details/60151623>