variance, covariance, correlation coefficient, Covariance matrix summary andpython Example
* Variance is used to measure random variables X And its mathematical expectation E(X) Deviation degree of.
* random variable X Deviation X-E(X) The mathematical expectation of the square of is called variance, Formula for： D(X)=E[(X−EX)2]
Variance is always a non negative number, When the possible values of random variables are concentrated near the mathematical expectation, Smaller variance; On the contrary, the variance is large. Therefore, the dispersion degree of random variable distribution can be inferred from the variance.
python Code example：
import numpy as np X = np.array([1,2,3]) print(np.var(X)) #0.666666666667
* Covariance is used to describe two random variables X , Y Correlation
* Formula for : Cov(X,Y)=E[(X−μx)(Y−μy)]
summary： If the covariance is positive, ExplainX,Y Same direction change, The larger the covariance is, the higher the degree of CO direction is; If the covariance is negative, ExplainX,Y Reverse motion, The higher the covariance, the higher the degree of inversion.
Above“ Same direction” and“ reverse” Understanding：
1） You become bigger. And I got bigger, It shows that the two variables change in the same direction, The covariance is positive.
2） You become bigger. And I get smaller, It shows that the two variables are reversed, So the covariance is negative.
3） From the numerical point of view, The higher the value of covariance, The greater the degree of the two variables in the same direction. Vice versa.
* Using random variablesX,Y The covariance of divided byX Sum of standard deviation ofY Standard deviation, Formula for：ρ=Cov(X,Y)σxσy
* The correlation coefficient can also be regarded as covariance： One is to eliminate two variable dimensions, Covariance after standardization.
Since it's a special covariance in, that：
1） It can also reflect whether two variables change in the same direction or in the opposite direction, If it changes in the same direction, it will be positive, Negative in reverse.
2） Because it's a standardized covariance, So here comes the more important feature： It eliminates the influence of two variables, But it only reflects the similarity of each unit of the two variables.
For two random variables：
1） When their correlation coefficient is1 Time, It shows that when two variables change, the positive similarity is the largest, Namely, You double, I've doubled; You double, I've doubled. That is to say, complete positive correlation（ withX,Y Is the abscissa and ordinate axis, You can draw a line with a positive slope, thereforeX,Y It's linear）.
2） As their correlation coefficient decreases, When the two variables change, the similarity becomes smaller, When the correlation coefficient is0 Time, There is no similarity between the two variables, That is, two variables are independent.
3） When the correlation coefficient continues to decrease, less than0 Time, Two variables begin to show reverse similarity, As the correlation coefficient continues to decrease, Reverse similarity will gradually increase.
4） When the correlation coefficient is－1 Time, It shows that the reverse similarity of two variables is the largest, Namely, You double, I'm twice as small; You double, I've doubled. I.e. completely negative correlation（ withX,Y Is the abscissa and ordinate axis, You can draw a line with a negative slope, thereforeX,Y It's also linear）.
* Covariance can only deal with two-dimensional problems, That is, the correlation degree of two random variables.
* More dimensions require more than one covariance to be calculated, So there's the covariance matrix.
* Each value of the covariance matrix is the covariance of two random variables corresponding to the subscript（ I.e. degree of correlation）.
Give an example： Three random variablesX,Y,Z, The covariance matrix is：
It can be seen that, Covariance matrix is a symmetric matrix, And the diagonal is the variance of each dimension.
python Code example：
import numpy as np X = np.array([[-2.1,-1,4.3],[3,1.1,0.12],[3,1.1,0.12]])
# Each row represents a random variable, Each column represents the value of a random variable #[[-2.1,-1,4.3], # [3,1.1,0.12], # [3,1.1,0.12]] print
(np.cov(X)) #[[ 11.71 , -4.286 , -4.286 ], # [ -4.286 , 2.14413333,
2.14413333], # [ -4.286 , 2.14413333, 2.14413333]])