principal component analysis(principal component analysis) Is a common method of data dimensionality reduction, Its purpose is to
stay“ information” Less loss, Convert high-dimensional data to low-dimensional data, So as to reduce the amount of calculation.

  PCA The essence is to find some projection direction, To maximize the variance of data in these projection directions, And these projection directions are orthogonal to each other.
This is actually the process of finding new orthogonal basis, Calculate the variance of the original data projected on these orthogonal bases, The bigger the variance is. It shows that more information is contained in the corresponding orthogonal basis. It will be proved later, The larger the eigenvalue of the covariance matrix of the original data, The larger the corresponding variance, The more information is projected on the corresponding eigenvector. Conversely, If the characteristic value is small, It means that the amount of information projected by data on these eigenvectors is very small, The data corresponding to the direction of small eigenvalues can be deleted, Thus, the purpose of dimensionality reduction is achieved.

  PCA All the work of, It is to find a set of mutually orthogonal coordinate axes in the original space in order
, The first axis is the one that maximizes the variance, The second axis maximizes the variance in a plane orthogonal to the first axis, The third axis is in relation to the1,2 The plane with the largest variance, This assumes thatN Dimension space, We can findN Axes like this, We take the front.r To approximate this space, So from oneN Space of dimension is compressed tor The space of dimension, But we choser Three axes can make the compression of space to minimize the loss of data.

   therefore, The key point is: How to find a new projection direction to make the original data“ information content” The least loss?

1. sample“ information content” Measure

   Sample“ information content” It refers to the variance of the projection of the sample in the feature direction. The bigger the variance is. The greater the difference between samples on this feature, So the more important the feature is.
with《 Machine learning practice》 Illustration on, In the problem of classification, The greater the variance of the sample, The easier it is to distinguish samples of different categories.

   The graph contains3 Categories of data, Obviously, The bigger the variance is. The easier it is to separate points of different categories. Sample inX The projection variance on the axis is large, stayY The projection variance of axis is small. The direction with the largest variance should be the middle inclined upward direction( Red line direction in the figure). If the sample is mapped in the middle oblique upward direction, Only one dimension data can be classified, Compared with two-dimensional original data, It's pretty much one-dimensional down.

In case of more dimensions of original data, First, get the direction of the largest variance after data transformation, Then select the direction orthogonal to the first direction, This direction is the direction with the next largest variance, Go on like this, Until new features with the same number of original features are transformed or before transformationN Characteristic( Before thisN Features contain most of the data), In short,PCA It's a process of dimensionality reduction, Map data to new features, The new feature is a linear combination of the original feature.

2. Computation process( Because it's difficult to insert a formula, Just take a screenshot)

stayPCA During the implementation of, Singular value decomposition of covariance matrix, Can getS matrix( Eigenvalue matrix).

Actual use

usesklearn EncapsulatedPCA Method, doPCA The code of is as follows.PCA Method parametern_components, If set to integer, ben_components=k.
If it is set to decimal, It indicates the information that can be retained after dimension reduction.
from sklearn.decomposition import PCA import numpy as np from
sklearn.preprocessingimport StandardScaler x=np.array([[10001,2,55], [16020,4,11
], [12008,6,33], [13131,8,22]]) # feature normalization (feature scaling)
X_scaler = StandardScaler() x = X_scaler.fit_transform(x)# PCA pca =
PCA(n_components=0.9)# Ensure data retention after dimension reduction90% Information pca.transform(x)
So in practicePCA Time, We don't have to choosek, It's a direct setupn_components byfloat data.

<> summary

PCA Prime fractionk Choice, It's a data compression problem. Usually we directlysklearn inPCA Method parametern_components Set tofloat data, To solve it indirectlyk Value selection problem. 
But sometimes we reduce dimensions just to observe the data(visualization), In this case, thek Choose as2 or3.
Parameter description:
n_components:   Significance:PCA Number of principal components to be retained in the algorithmn, That is, the number of features retainedn type:int perhaps
string, Default isNone, All ingredients are retained.           Assignment isint, such asn_components=1, Will reduce raw data to one dimension.    
      Assignment isstring, such asn_components='mle', The number of features will be automatically selectedn, Make the required variance percentage satisfied.
type:bool,True perhapsFalse, Default isTrue.
Significance: Indicates whether the algorithm is running, Copy the original training data. if it isTrue, Then runPCA After algorithm, The value of the original training data is not          
  There will be any change, Because it's on a copy of the original data; if it isFalse, Then runPCA After algorithm, Of the original training data            
  The value will be changed. Because dimension reduction calculation is carried out on the original data.


type:bool, Default isFalse

Significance: Albinism, Make each feature have the same variance. about“ Albinism”, May refer to:Ufldl Course

<>2,PCA Object properties

components_ : Returns the component with the largest variance.
explained_variance_ratio_: Return Reservedn Variance percentage of each component.
n_components_: Returns the number of ingredients retainedn.

<>3,PCA Object method

* fit(X,y=None)
fit() It can be said thatscikit-learn General method in, Every algorithm that needs training will havefit() Method, It's actually the algorithm“ train” This step. becausePCA Unsupervised learning algorithm, herey Nature equalsNone.
fit(X), Data for presentationX To trainPCA Model.
Function return value: callfit Method's object itself. such, Express useX Yespca This object trains.

* fit_transform(X) useX To trainPCA Model, Return dimension reduced data at the same time.
newX=pca.fit_transform(X),newX Data after dimension reduction.

* inverse_transform() Convert dimension reduced data to original data,X=pca.inverse_transform(newX)

* transform(X) Will dataX Convert to dimension reduced data. When the model is trained, For new input data, Both can be used.transform Methods to reduce dimensions.
in addition, Alsoget_covariance(),get_precision(),get_params(deep=True),score(X,
y=None) Other methods, You can use it later.

3.python Realization

  Below is an introduction.Sklearn inPCA Method of dimension reduction.
        Import method:

[python] view plain <> 
copy <>
* from sklearn.decomposition import PCA          
The call function is as follows, amongn_components=2 Reduced to2 dimension.
[python] view plain <> 
copy <>
* pca = PCA(n_components=2)           For example, the following codePCA Dimension reduction operation:
[python] view plain <> 
copy <>
* import numpy as np  
* from sklearn.decomposition import PCA  
* X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])  
* pca = PCA(n_components=2)  
* print pca  
* print(pca.explained_variance_ratio_)            The output is as follows:
[python] view plain <> 
copy <>
* PCA(copy=True, n_components=2, whiten=False)  
* [ 0.99244291  0.00755711]           Re loadingboston data set, In total10 Characteristic, Dimension reduction into two features:
[python] view plain <> 
copy <>
* # Load data set  
* from sklearn.datasets import load_boston  
* d = load_boston()  
* x =  
* y =  
* print x[:10]  
* print u' shape:', x.shape  
* # Dimension reduction  
* import numpy as np  
* from sklearn.decomposition import PCA  
* pca = PCA(n_components=2)  
* newData = pca.fit_transform(x)  
* print u' Data after dimension reduction:'  
* print newData[:4]  
* print u' shape:', newData.shape           The output is as follows, Reduce to2 Dimension data.
[python] view plain <> 
copy <>
* [[  6.32000000e-03   1.80000000e+01   2.31000000e+00   0.00000000e+00  
*     5.38000000e-01   6.57500000e+00   6.52000000e+01   4.09000000e+00  
*     1.00000000e+00   2.96000000e+02   1.53000000e+01   3.96900000e+02  
*     4.98000000e+00]  
*  [  2.73100000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00  
*     4.69000000e-01   6.42100000e+00   7.89000000e+01   4.96710000e+00  
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.96900000e+02  
*     9.14000000e+00]  
*  [  2.72900000e-02   0.00000000e+00   7.07000000e+00   0.00000000e+00  
*     4.69000000e-01   7.18500000e+00   6.11000000e+01   4.96710000e+00  
*     2.00000000e+00   2.42000000e+02   1.78000000e+01   3.92830000e+02  
*     4.03000000e+00]  
*  [  3.23700000e-02   0.00000000e+00   2.18000000e+00   0.00000000e+00  
*     4.58000000e-01   6.99800000e+00   4.58000000e+01   6.06220000e+00  
*     3.00000000e+00   2.22000000e+02   1.87000000e+01   3.94630000e+02  
*     2.94000000e+00]]  
* shape: (506L, 13L)  
* Data after dimension reduction:  
* [[-119.81821283    5.56072403]  
*  [-168.88993091  -10.11419701]  
*  [-169.31150637  -14.07855395]  
*  [-190.2305986   -18.29993274]]  
* shape: (506L, 2L)