principal component analysis (principal component analysis) Is a common method of data dimensionality reduction , Its purpose is to
stay “ information ” Less loss , Convert high-dimensional data to low-dimensional data , So as to reduce the amount of calculation .

  PCA The essence is to find some projection direction , To maximize the variance of data in these projection directions , And these projection directions are orthogonal to each other .
This is actually the process of finding new orthogonal basis , Calculate the variance of the original data projected on these orthogonal bases , The greater the variance , It shows that more information is contained in the corresponding orthogonal basis . It will be proved later , The larger the eigenvalue of the covariance matrix of the original data , The larger the corresponding variance , The more information is projected on the corresponding eigenvector . conversely , If the characteristic value is small , It means that the amount of information projected by the data on these eigenvectors is very small , The data corresponding to the direction of small eigenvalues can be deleted , Thus, the purpose of dimensionality reduction is achieved .

  PCA All the work of , It is to find a set of mutually orthogonal coordinate axes in the original space in order
, The first axis is the one that maximizes the variance , The second axis maximizes the variance in a plane orthogonal to the first axis , The third axis is in relation to the 1,2 The plane with the largest variance , This assumes that N In dimensional space , We can find N Axes like this , Before we take it r To approximate this space , So from one N Space of dimension is compressed to r The space of dimension , But we chose r Three axes can make the compression of space to minimize the loss of data .

   therefore , The key point is : How to find a new projection direction to make the original data “ information content ” Least loss ?

1. sample “ information content ” Measurement of

   Sample's “ information content ” It refers to the variance of the projection of the sample in the feature direction . The greater the variance , The greater the difference between samples on this feature , So the more important the feature is .
with 《 Machine learning practice 》 Illustration on , In the problem of classification , The greater the variance of the sample , The easier it is to distinguish samples of different categories .

   In the picture 3 Categories of data , Obviously , The greater the variance , The easier it is to separate points of different categories . Sample in X The projection variance on the axis is large , stay Y The projection variance of axis is small . The direction with the largest variance should be the middle inclined upward direction ( Red line direction in the figure ). If the sample is mapped in the middle oblique upward direction , Only one dimension data can be classified , Compared with two-dimensional original data , It's pretty much one-dimensional down .

In case of more dimensions of original data , First, get the direction of the largest variance after data transformation , Then select the direction orthogonal to the first direction , This direction is the direction with the next largest variance , It goes on like this , Until new features with the same number of original features are transformed or before transformation N Features ( Before this N Features contain most of the data ), In short ,PCA It's a process of dimensionality reduction , Map data to new features , The new feature is a linear combination of the original feature .

2. Calculation process ( Because it's difficult to insert a formula , Just take a screenshot )

stay PCA During the implementation of , Singular value decomposition of covariance matrix , Can get S matrix ( Eigenvalue matrix ).

Actual use

use sklearn Encapsulated PCA method , do PCA The code of is as follows .PCA Method parameters n_components, If set to integer , be n_components=k.
If it is set to decimal , It indicates the information that can be retained after dimension reduction .
from sklearn.decomposition import PCA import numpy as np from
sklearn.preprocessingimport StandardScaler x=np.array([[10001,2,55], [16020,4,11
], [12008,6,33], [13131,8,22]]) # feature normalization (feature scaling)
X_scaler = StandardScaler() x = X_scaler.fit_transform(x)# PCA pca =
PCA(n_components=0.9)# Ensure data retention after dimension reduction 90% Information about pca.transform(x)
So in practice PCA Time , We don't have to choose k, It's a direct setup n_components by float data .

<> summary

PCA Prime fraction k Choice of , It's a data compression problem . Usually we directly sklearn in PCA Method parameters n_components Set to float data , To solve it indirectly k Value selection problem . 
But sometimes we reduce dimensions just to observe the data (visualization), In this case, the k Select as 2 or 3.
Parameter description :
n_components:   significance :PCA Number of principal components to be retained in the algorithm n, That is, the number of features retained n type :int perhaps
string, Default is None, All ingredients are retained .           Assigned as int, such as n_components=1, Will reduce raw data to one dimension .    
      Assigned as string, such as n_components='mle', The number of features will be automatically selected n, Make the required variance percentage satisfied .
type :bool,True perhaps False, Default is True.
significance : Indicates whether the algorithm is running , Copy the original training data . if it is True, Run PCA After algorithm , The value of the original training data is not          
  There will be any change , Because it's on a copy of the original data ; if it is False, Run PCA After algorithm , Of the original training data            
  Value will change , Because dimension reduction calculation is carried out on the original data .


type :bool, Default is False

significance : Albinism , Make each feature have the same variance . about “ Albinism ”, Reference :Ufldl course

<>2,PCA Object properties

components_ : Returns the component with the largest variance .
explained_variance_ratio_: return Reserved n Variance percentage of each component .
n_components_: Returns the number of ingredients retained n.

<>3,PCA Object method

* fit(X,y=None)
fit() It can be said that scikit-learn General method in , Every algorithm that needs training will have fit() method , It's actually the algorithm “ train ” This step . because PCA Unsupervised learning algorithm , here y Nature is equal to None.
fit(X), Data for presentation X To train PCA Model .
Function return value : call fit Method's object itself . such as, For presentation X Yes pca This object trains .

* fit_transform(X) use X To train PCA Model , Return dimension reduced data at the same time .
newX=pca.fit_transform(X),newX Data after dimension reduction .

* inverse_transform() Convert dimension reduced data to original data ,X=pca.inverse_transform(newX)

* transform(X) Transfer data X Convert to dimension reduced data . When the model is trained , For new input data , Can be used transform Methods to reduce dimensions .
in addition , also get_covariance(),get_precision(),get_params(deep=True),score(X,
y=None) Etc , You can use it later .

3.python realization

  Here's how Sklearn in PCA Method of dimension reduction .
        Import method :

[python] view plain <> 
copy <>
* from sklearn.decomposition import PCA          
The call function is as follows , among n_components=2 Reduced to 2 dimension .
[python] view plain <> 
copy <>
* pca = PCA(n_components=2)           For example, the following code PCA Dimension reduction operation :
[python] view plain <> 
copy <>
* import numpy as np  
* from sklearn.decomposition import PCA  
* X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])  
* pca = PCA(n_components=2)  
* print pca  
* print(pca.explained_variance_ratio_)            The output is as follows :
[python] view plain <> 
copy <>
* PCA(copy=True, n_components=2, whiten=False)  
* [ 0.99244291  0.00755711]           Another example is loading boston data set , in total 10 Features , Dimension reduction into two features :
[python] view plain <> 
copy <>
* # Load data set   
* from sklearn.datasets import load_boston  
* d = load_boston()  
* x =  
* y =  
* print x[:10]  
* print u' shape :', x.shape  
* # Dimensionality reduction   
* import numpy as np  
* from sklearn.decomposition import PCA  
* pca = PCA(n_components=2)  
* newData = pca.fit_transform(x)  
* print u' Data after dimension reduction :'  
* print newData[:4]  
* print u' shape :', newData.shape           The output is as follows , Reduce to 2 Dimension data .
[python] view plain <> 
copy <>
* [[  6.32000000e-03   1.80000000e+01   2.31000000e+00   0.00000000e+00  
*     5.38000000e-01   6.57500000e+00   6.52000000e+01   4.09000000e+00  
*     1.00000000e+00   2.96000000e+02   1.53000000e+01   3.96900000e+02  
*     4.98000000e+00]  
*  [  2.73100000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00  
*     4.69000000e-01   6.42100000e+00   7.89000000e+01   4.96710000e+00  
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.96900000e+02  
*     9.14000000e+00]  
*  [  2.72900000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00  
*     4.69000000e-01   7.18500000e+00   6.11000000e+01   4.96710000e+00  
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.92830000e+02  
*     4.03000000e+00]  
*  [  3.23700000e-02   0.00000000e+00   2.18000000e+00   0.00000000e+00  
*     4.58000000e-01   6.99800000e+00   4.58000000e+01   6.06220000e+00  
*     3.00000000e+00   2.22000000e+02   1.87000000e+01   3.94630000e+02  
*     2.94000000e+00]]  
* shape : (506L, 13L)  
* Data after dimension reduction :  
* [[-119.81821283    5.56072403]  
*  [-168.88993091  -10.11419701]  
*  [-169.31150637  -14.07855395]  
*  [-190.2305986   -18.29993274]]  
* shape : (506L, 2L)