variance , covariance , correlation coefficient , Covariance matrix summary and python Example
* Variance is used to measure random variables X And its mathematical expectation E(X) Deviation degree of .
* random variable X The dispersion of X-E(X) The mathematical expectation of the square of is called variance , The formula is ： D(X)=E[(X−EX)2]
Variance is always a non negative number , When the possible values of random variables are concentrated near the mathematical expectation , Small variance ; On the contrary, the variance is large . Therefore, the dispersion degree of random variable distribution can be inferred from the variance .
python Code example ：
import numpy as np X = np.array([1,2,3]) print(np.var(X)) #0.666666666667
* Covariance is used to describe two random variables X , Y Relevance of
* The formula is : Cov(X,Y)=E[(X−μx)(Y−μy)]
summary ： If the covariance is positive , explain X,Y Isotropic change , The larger the covariance is, the higher the degree of CO direction is ; If the covariance is negative , explain X,Y Reverse motion , The higher the covariance, the higher the degree of inversion .
For the above “ In the same direction ” and “ reverse ” Understanding of ：
1） You get bigger , And I got bigger , It shows that the two variables change in the same direction , The covariance is positive .
2） You get bigger , And I get smaller , It shows that the two variables are reversed , The covariance is negative .
3） From the numerical point of view , The higher the value of covariance , The greater the degree of the two variables in the same direction . vice versa .
* Using random variables X,Y The covariance of divided by X Sum of standard deviation of Y Standard deviation of , The formula is ：ρ=Cov(X,Y)σxσy
* The correlation coefficient can also be regarded as covariance ： One is to eliminate two variable dimensions , Covariance after standardization .
Since it's a special covariance in , that ：
1） It can also reflect whether two variables change in the same direction or in the opposite direction , If it changes in the same direction, it will be positive , Negative in reverse .
2） Because it's a standardized covariance , So here comes the more important feature ： It eliminates the influence of two variables , But it only reflects the similarity of each unit of the two variables .
For two random variables ：
1） When their correlation coefficient is 1 Time , It shows that when two variables change, the positive similarity is the largest , Namely , You double , I've doubled ; You double , I've doubled . That is to say, complete positive correlation （ with X,Y Is the abscissa and ordinate axis , You can draw a line with a positive slope , therefore X,Y It's linear ）.
2） As their correlation coefficient decreases , When the two variables change, the similarity becomes smaller , When the correlation coefficient is 0 Time , There is no similarity between the two variables , That is, two variables are independent .
3） When the correlation coefficient continues to decrease , less than 0 Time , Two variables begin to show reverse similarity , As the correlation coefficient continues to decrease , Reverse similarity will gradually increase .
4） When the correlation coefficient is －1 Time , It shows that the reverse similarity of two variables is the largest , Namely , You double , I'm twice as small ; You double , I've doubled . I.e. completely negative correlation （ with X,Y Is the abscissa and ordinate axis , You can draw a line with a negative slope , therefore X,Y It's also linear ）.
* Covariance can only deal with two-dimensional problems , That is, the correlation degree of two random variables .
* More dimensions require more than one covariance to be calculated , So there's the covariance matrix .
* Each value of the covariance matrix is the covariance of two random variables corresponding to the subscript （ I.e. degree of correlation ）.
give an example ： Three random variables X,Y,Z, The covariance matrix is ：
It can be seen that , Covariance matrix is a symmetric matrix , And the diagonal is the variance of each dimension .
python Code example ：
import numpy as np X = np.array([[-2.1,-1,4.3],[3,1.1,0.12],[3,1.1,0.12]])
# Each row represents a random variable , Each column represents the value of a random variable #[[-2.1,-1,4.3], # [3,1.1,0.12], # [3,1.1,0.12]] print
(np.cov(X)) #[[ 11.71 , -4.286 , -4.286 ], # [ -4.286 , 2.14413333,
2.14413333], # [ -4.286 , 2.14413333, 2.14413333]])