principal component analysis （principal component analysis） Is a common method of data dimensionality reduction , Its purpose is to
stay “ information ” Less loss , Convert high-dimensional data to low-dimensional data , So as to reduce the amount of calculation .

PCA The essence is to find some projection direction , To maximize the variance of data in these projection directions , And these projection directions are orthogonal to each other .
This is actually the process of finding new orthogonal basis , Calculate the variance of the original data projected on these orthogonal bases , The greater the variance , It shows that more information is contained in the corresponding orthogonal basis . It will be proved later , The larger the eigenvalue of the covariance matrix of the original data , The larger the corresponding variance , The more information is projected on the corresponding eigenvector . conversely , If the characteristic value is small , It means that the amount of information projected by the data on these eigenvectors is very small , The data corresponding to the direction of small eigenvalues can be deleted , Thus, the purpose of dimensionality reduction is achieved .

PCA All the work of , It is to find a set of mutually orthogonal coordinate axes in the original space in order
, The first axis is the one that maximizes the variance , The second axis maximizes the variance in a plane orthogonal to the first axis , The third axis is in relation to the 1,2 The plane with the largest variance , This assumes that N In dimensional space , We can find N Axes like this , Before we take it r To approximate this space , So from one N Space of dimension is compressed to r The space of dimension , But we chose r Three axes can make the compression of space to minimize the loss of data .

therefore , The key point is ： How to find a new projection direction to make the original data “ information content ” Least loss ?

1. sample “ information content ” Measurement of

Sample's “ information content ” It refers to the variance of the projection of the sample in the feature direction . The greater the variance , The greater the difference between samples on this feature , So the more important the feature is .
with 《 Machine learning practice 》 Illustration on , In the problem of classification , The greater the variance of the sample , The easier it is to distinguish samples of different categories .

In the picture 3 Categories of data , Obviously , The greater the variance , The easier it is to separate points of different categories . Sample in X The projection variance on the axis is large , stay Y The projection variance of axis is small . The direction with the largest variance should be the middle inclined upward direction （ Red line direction in the figure ）. If the sample is mapped in the middle oblique upward direction , Only one dimension data can be classified , Compared with two-dimensional original data , It's pretty much one-dimensional down .

In case of more dimensions of original data , First, get the direction of the largest variance after data transformation , Then select the direction orthogonal to the first direction , This direction is the direction with the next largest variance , It goes on like this , Until new features with the same number of original features are transformed or before transformation N Features （ Before this N Features contain most of the data ）, In short ,PCA It's a process of dimensionality reduction , Map data to new features , The new feature is a linear combination of the original feature .

2. Calculation process （ Because it's difficult to insert a formula , Just take a screenshot ）

stay PCA During the implementation of , Singular value decomposition of covariance matrix , Can get S matrix （ Eigenvalue matrix ）.

Actual use

use sklearn Encapsulated PCA method , do PCA The code of is as follows .PCA Method parameters n_components, If set to integer , be n_components=k.
If it is set to decimal , It indicates the information that can be retained after dimension reduction .
from sklearn.decomposition import PCA import numpy as np from
sklearn.preprocessingimport StandardScaler x=np.array([[10001,2,55], [16020,4,11
], [12008,6,33], [13131,8,22]]) # feature normalization (feature scaling)
X_scaler = StandardScaler() x = X_scaler.fit_transform(x)# PCA pca =
PCA(n_components=0.9)# Ensure data retention after dimension reduction 90% Information about pca.fit(x) pca.transform(x)
So in practice PCA Time , We don't have to choose k, It's a direct setup n_components by float data .

<> summary

PCA Prime fraction k Choice of , It's a data compression problem . Usually we directly sklearn in PCA Method parameters n_components Set to float data , To solve it indirectly k Value selection problem .
But sometimes we reduce dimensions just to observe the data （visualization）, In this case, the k Select as 2 or 3.
Parameter description ：
n_components:   significance ：PCA Number of principal components to be retained in the algorithm n, That is, the number of features retained n type ：int perhaps
string, Default is None, All ingredients are retained .           Assigned as int, such as n_components=1, Will reduce raw data to one dimension .
Assigned as string, such as n_components='mle', The number of features will be automatically selected n, Make the required variance percentage satisfied .
copy:
type ：bool,True perhaps False, Default is True.
significance ： Indicates whether the algorithm is running , Copy the original training data . if it is True, Run PCA After algorithm , The value of the original training data is not
There will be any change , Because it's on a copy of the original data ; if it is False, Run PCA After algorithm , Of the original training data
Value will change , Because dimension reduction calculation is carried out on the original data .

whiten:

type ：bool, Default is False

significance ： Albinism , Make each feature have the same variance . about “ Albinism ”, Reference ：Ufldl course
<http://deeplearning.stanford.edu/wiki/index.php/%E7%99%BD%E5%8C%96>

<>2,PCA Object properties

components_ ： Returns the component with the largest variance .
explained_variance_ratio_： return Reserved n Variance percentage of each component .
n_components_： Returns the number of ingredients retained n.
mean_：
noise_variance_：

<>3,PCA Object method

* fit(X,y=None)
fit() It can be said that scikit-learn General method in , Every algorithm that needs training will have fit() method , It's actually the algorithm “ train ” This step . because PCA Unsupervised learning algorithm , here y Nature is equal to None.
fit(X), Data for presentation X To train PCA Model .
Function return value ： call fit Method's object itself . such as pca.fit(X), For presentation X Yes pca This object trains .

* fit_transform(X) use X To train PCA Model , Return dimension reduced data at the same time .
newX=pca.fit_transform(X),newX Data after dimension reduction .

* inverse_transform() Convert dimension reduced data to original data ,X=pca.inverse_transform(newX)

* transform(X) Transfer data X Convert to dimension reduced data . When the model is trained , For new input data , Can be used transform Methods to reduce dimensions .
y=None) Etc , You can use it later .

3.python realization

Here's how Sklearn in PCA Method of dimension reduction .
Import method ：

[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* from sklearn.decomposition import PCA
The call function is as follows , among n_components=2 Reduced to 2 dimension .
[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* pca = PCA(n_components=2)           For example, the following code PCA Dimension reduction operation ：
[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* import numpy as np
* from sklearn.decomposition import PCA
* X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
* pca = PCA(n_components=2)
* print pca
* pca.fit(X)
* print(pca.explained_variance_ratio_)            The output is as follows ：
[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* PCA(copy=True, n_components=2, whiten=False)
* [ 0.99244291  0.00755711]           Another example is loading boston data set , in total 10 Features , Dimension reduction into two features ：
[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* x = d.data
* y = d.target
* print x[:10]
* print u' shape :', x.shape
*
* # Dimensionality reduction
* import numpy as np
* from sklearn.decomposition import PCA
* pca = PCA(n_components=2)
* newData = pca.fit_transform(x)
* print u' Data after dimension reduction :'
* print newData[:4]
* print u' shape :', newData.shape           The output is as follows , Reduce to 2 Dimension data .
[python] view plain <http://blog.csdn.net/eastmount/article/details/53285192#>
copy <http://blog.csdn.net/eastmount/article/details/53285192#>
* [[  6.32000000e-03   1.80000000e+01   2.31000000e+00   0.00000000e+00
*     5.38000000e-01   6.57500000e+00   6.52000000e+01   4.09000000e+00
*     1.00000000e+00   2.96000000e+02   1.53000000e+01   3.96900000e+02
*     4.98000000e+00]
*  [  2.73100000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00
*     4.69000000e-01   6.42100000e+00   7.89000000e+01   4.96710000e+00
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.96900000e+02
*     9.14000000e+00]
*  [  2.72900000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00
*     4.69000000e-01   7.18500000e+00   6.11000000e+01   4.96710000e+00
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.92830000e+02
*     4.03000000e+00]
*  [  3.23700000e-02   0.00000000e+00   2.18000000e+00   0.00000000e+00
*     4.58000000e-01   6.99800000e+00   4.58000000e+01   6.06220000e+00
*     3.00000000e+00   2.22000000e+02   1.87000000e+01   3.94630000e+02
*     2.94000000e+00]]
* shape : (506L, 13L)
* Data after dimension reduction :
* [[-119.81821283    5.56072403]
*  [-168.88993091  -10.11419701]
*  [-169.31150637  -14.07855395]
*  [-190.2305986   -18.29993274]]
* shape : (506L, 2L)