Deep learning/ Basic mathematics knowledge arrangement of machine learning（ Two）： Gradient and derivative, Matrix Calculus, Taylor expansion, etc
Derivative and gradient
Derivatives： The derivative of a univariate function at a certain point describes the rate of change of the function near this point.
gradient: Derivative of multivariate function is gradient.
* First derivative, Gradient（gradient）：
* Two derivative,Hessian matrix：
The first derivative and the second derivative are often recorded asf′(x) andf′′(x)
Taylor expansion： Taylor expansion of unary function：
Taylor expansion of multivariate functions（ Only the first three items）：
Be called“ Stationary point”, If it's a unary function, So this point must be a local extreme point, Maximum or minimum local extremum, Iff A convex function is a global minimum, Convex functions are briefly introduced in the next section.
If it's a multivariate function,∇2f(xk)≻0 Positive definite, That is, all eigenvalues are positive, Then the third term of the above formula is positive, be xk Is a strict local minimum point（ Conversely,∇2f(xk)≺0
Negative definite strict local minimum）. More complicated, If the eigenvalue of the second derivative has positive or negative, So it's uncertain, This timexk
Is a saddle point, That is, some dimensions are local minima, Some are local maxima, Saddle point is one of the core difficulties in neural network training, I will write later in other blogs, Let's go back to basics.
Taylor expansion is the core of many mathematical problems, Let's expand a little bit here：
problem： Why to choose gradient direction in optimization, Why is gradient direction the fastest changing direction?
The first two terms of Taylor series expansionf(xk+δ)≈f(xk)+∇Tf(xk)δ knowable, Whenδ Is a vector whose modulus is fixed but whose direction is uncertain,f(xk+δ)−f(xk)≈∇Tf(xk)
δ, here∇Tf(xk)δ=||∇Tf(xk)||⋅||δ||cos(θ), Maximum incos(θ)=1 Fetch, Namelyδ
Take gradient direction or negative gradient direction. If it's a minimum, So it's the gradient descent method,δ Take negative gradient direction, bringf(x) The fastest fall.
Matrix derivation summary
（1） Derivation of scalar
* Scalar about Scalarx Derivation：
* Vector about Scalarx Derivation：
vectory=⎡⎣⎢⎢⎢⎢⎢y1y2⋮yn⎤⎦⎥⎥⎥⎥⎥ On scalarx The derivation of y Each element ofx Derivation, It can be expressed as
* matrix· On scalarx Derivation：
The derivative of a matrix to a scalar is similar to that of a vector to a scalar, That is to say, each element of a matrix has a scalarx Derivation
（2） Derivation of vectors
* Scalar about vectorx Derivative
scalary About vector x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥ The derivation of can be expressed as
=[∂y∂x1 ∂y∂x2 ⋯ ∂y∂xn]
* Vectors about vectors x Derivative
Vector function（ That is, the vector composed of functions）y=⎡⎣⎢⎢⎢⎢⎢y1y2⋮yn⎤⎦⎥⎥⎥⎥⎥ aboutx=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥ Derivative
Matrix obtained at this time∂y∂x Be calledJacobian matrix.
Matrix on the derivative of vector
matrixY=⎡⎣⎢⎢⎢⎢⎢y11y21⋮yn1y12y22⋮yn2⋯⋯⋱⋯y1ny2n⋮ynn⎤⎦⎥⎥⎥⎥⎥ about x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥
The derivative of is the most complicated one in derivation, Expressed as
（3） Derivation of matrix
In general, only scalar derivatives of matrices are considered, Scalar quantityy Pair matrix X Derivative, The derivative is a gradient matrix, It can be expressed as the following formula：
The following figure is a common matrix derivation form in machine learning, For reference
The next one is aboutHessian Basic concepts of matrix and convex function, To be continued.
 Jacobian Matrix sumHessian matrix
 Matrix derivation of linear algebra in machine learning http://blog.csdn.net/u010976453/article/details/54381248
 Newton method andHessian matrixhttp://blog.csdn.net/linolzhang/article/details/60151623