PCA（主成分分析）方法数据降维、重构和人脸识别 - 好文

本文使用matlab采用PCA完成对数据的降维、重构和人脸识别。

参考文章：http://blog.csdn.net/watkinsong/article/details/38536463
<http://blog.csdn.net/watkinsong/article/details/38536463>

我眼中的PCA：

数据的维数过高，处理起来耗时又费力，于是就在想我能不能只处理部分维数，并且得到的结果与全部维数的结果一致。当当当，PCA就出炉了。简单来说，就是一个图片有2000个特征维度，而实际上只有其中100维（甚至更少），对结果的影响起着巨大的作用。

eg:对于皇帝来说，内阁首辅>二辅>三辅>四辅>>其他不知名官员。所以对于皇帝来说，整个内阁所提供的有效治国方略的所占比可以看作是60%，整个文官阶级可以看作是75%，武官阶级20%，平民百姓5%。也就是说虽然老百姓人挺多的，但是提供的治国方案很少，所以认为可以选择性忽略掉他们的提议。再其次，可以忽略武官、文官。。

总结一下就是，我们只关注影响最大的特征维度，放弃掉影响力不足的特征维度。
<http://blog.csdn.net/watkinsong/article/details/38536463>

PCA思路流程如下：

1、减去均值，中心化

2、计算协方差矩阵

3、选取特征值和特征向量

4、训练集转换到特征向量构成的向量空间中完成降维

5、测试集乘以特征向量的转置，再加上去中心化的均值以完成重构

6
、识别：选取每个人的一张照片做登记记录，减去均值，乘以降维阵（即特征向量），并将记录集在降维阵中的值记录下来。遍历图片库，并对照片做同样的处理。取图片在降维阵中的值与记录集的值最小欧式距离的图片所属人，为该图片的所属分类。

Matlab代码如下：

%% 读入图片 clear ; close all; clc %m = 1680; % number of samples trainset =
zeros(10, 50 * 40); % 图片大小 50 * 40 file_path =
'C:\Users\zyfls\Desktop\ML\第五章数据降维\数据\AR\AR\';% 图像文件夹路径 img_path_list =
dir(strcat(file_path,'*.bmp'));%获取该文件夹中所有bmp格式的图像 img_num =
length(img_path_list);%获取图像总数量 for i = 10: img_num %取出去前十张照片之外做为训练集，前十张作为测试
image_name = img_path_list(i).name;% 图像名 end             %% before training
PCA, do feature normalization     mu = mean(trainset);%mean函数用来求
沿数组中不同维的元素的平均值。     trainset_norm = bsxfun(@minus, trainset, mu);%训练集减去平均值
    sigma = std(trainset_norm); %std 计算标准差     trainset_norm =
bsxfun(@rdivide, trainset_norm, sigma); %trainset_norm 点除 sigma(标准差)
    %% we could save the mean face mu to take a look the mean face
imwrite(uint8(reshape(mu, 50, 40)),
'C:\Users\zyfls\Desktop\ML各种截图\5\乱七八糟PCA\meanface.bmp');     fprintf('mean
face saved. paused\n');     %% 计算降维阵     X = trainset; % just for convience
    [m, n] = size(X);          U = zeros(n);     S = zeros(n);          Cov
= (1 / m) * X' * X; %计算协方差矩阵     [U, S, V] = svd(Cov);%奇异值分解，返回一个与X
同大小的对角矩阵S，两个正交矩阵U 和V，且满足= U*S*V'。若A 为m×n 阵，则U 为m×m 阵，V为n×n 阵。奇异值在S
的对角线上，非负且按降序排列。     %使用SVD可以对非方阵进行PCA处理，下面注释的内容可以处理方阵     E = diag(S);
contribution = cumsum(E)./sum(E);%计算贡献率 %     [U,D]=eig(Cov);
%计算矩阵R的特征向量矩阵V和特征值矩阵D,特征值由小到大 %     U=(rot90(U))';      %将特征向量矩阵U从大到小排序 %
D=rot90(rot90(D)); %将特征值矩阵由大到小排序 %     E=diag(D);          %将特征值矩阵转换为特征值向量
%     ratio=0; %累计贡献率 %     for k=1:n %         r=E(k)/sum(E);   %第k主成份贡献率
%         ratio=ratio+r; %累计贡献率 %         if(ratio>=0.9) %取累计贡献率大于等于90%的主成分
%             break; %         end %     end     fprintf('compute cov
done.\n');     %降维矩阵U中的特征向量，在关于人脸的降维中，又被称为特征脸， U 中的每个特征向量相当于找到的降维空间的一个方向。
利用U可以将特征映射到这个空间中。     %% 显示特征脸 U的前十项     for i = 1:10         ef = U(:, i);
        img = ef;         minVal = min(img);         img = img - minVal;
        max_val = max(abs(img));         img = img / max_val;         img =
reshape(img, 50, 40);         imwrite(img,
strcat('C:\Users\zyfls\Desktop\ML各种截图\5\乱七八糟PCA\','eigenface', int2str(i),
'.bmp'));     end          fprintf('eigen face saved, paused.\n');
pause;          %% dimension reduction     k = 100; % reduce to 100
dimension     test = zeros(10, 50 * 40);     file_path =
'C:\Users\zyfls\Desktop\ML\第五章数据降维\数据\AR\AR\';% 图像文件夹路径     img_path_list =
dir(strcat(file_path,'*.bmp'));%获取该文件夹中所有bmp格式的图像     for i = 1:10 %前十个测试集
        image_name = img_path_list(i).name;% 图像名         img =
imread(strcat(file_path,image_name));         %img =
imread(strcat('C:\Users\zyfls\Desktop\ML各种截图\5\', int2str(i), '.bmp'));
        img = double(img);         test(i, :) = img(:);     end
% test set need to do normalization     test = bsxfun(@minus, test, mu);
         % reduction 降维     Uk = U(:, 1:k); %取从1到dimsion的特征向量作为降维空间     Z =
test * Uk;     fprintf('reduce done.\n');         %% 测试集重构     %% for the
test set images, we only minus the mean face,     % so in the reconstruct
process, we need add the mean face back     Xp = Z * Uk';     % show
reconstructed face     for i = 1:10         face = Xp(i, :);         %face =
face .* sigma;         face = face + mu;         face = reshape((face), 50,
40);         imwrite(uint8(face),
strcat('C:\Users\zyfls\Desktop\ML各种截图\5\乱七八糟PCA\','reconstructionface',
int2str(i), '.bmp'));         Face_re(i,:)=Xp(i,:)+mu;     end     e =
Face_re-test;     error(1,i)=norm(e);     %dispaly error rate
error_rate=error(1,i);     display(error_rate);    %1.9061e+04
%训练集的重构，因为训练集多除了个sigma矩阵这里再乘回来     %% for the train set reconstruction, we
minus the mean face and divide by standard deviation during the train     %
so in the reconstruction process, we need to multiby standard deviation first,
    % and then add the mean face back     trainset_re = trainset_norm * Uk;
% reduction     trainset_re = trainset_re * Uk'; % reconstruction     for i =
11:25         train = trainset_re(i, :);         train = train .* sigma;
        train = train + mu;         train = reshape(train, 50, 40);
imwrite(uint8(train), strcat('C:\Users\zyfls\Desktop\ML各种截图\5\乱七八糟PCA\',
'reconstruction',int2str(i), 'train.bmp'));     end 以上代码完成降维和重构：

得到的平均脸如右侧所示：

特征脸：

重构图像：

<http://blog.csdn.net/watkinsong/article/details/38536463>

以上部分完成了降维、重构。

识别：

识别代码如下（含降维和重构）：

%% 读入图片 clear ; close all; clc %m = 1680; % number of samples trainset =
zeros(10, 50 * 40); % 图片大小 50 * 40 file_path =
'C:\Users\zyfls\Desktop\ML\第五章数据降维\数据\AR\AR\';% 图像文件夹路径 img_path_list =
dir(strcat(file_path,'*.bmp'));%获取该文件夹中所有bmp格式的图像 img_num =
length(img_path_list);%获取图像总数量 j=1; for i = 1: img_num %取所有照片做为训练集 if(mod(i,14)
== 0)%每个人的最后一张留下来做测试集 continue; end image_name = img_path_list(i).name;% 图像名 %
name = image_name(1:3); % if strcmp(name,'001') img =
imread(strcat(file_path,image_name)); img = double(img); trainset(j, :) =
img(:); j=j+1; % end end %% before training PCA, do feature normalization mu =
mean(trainset);%mean函数用来求沿数组中不同维的元素的平均值。 trainset_norm = bsxfun(@minus,
trainset, mu);%训练集减去平均值 sigma = std(trainset_norm); %std 计算标准差 trainset_norm =
bsxfun(@rdivide, trainset_norm, sigma); %trainset_norm 点除 sigma(标准差) %% we
could save the mean face mu to take a look the mean face
imwrite(uint8(reshape(mu, 50, 40)),
'C:\Users\zyfls\Desktop\ML各种截图\5\Recognition\meanface.bmp'); %% 计算降维阵 X =
trainset; % just for convience [m, n] = size(X); U = zeros(n); S = zeros(n);
Cov = (1 / m) * X' * X; %计算协方差矩阵 [U, S, V] = svd(Cov);%奇异值分解，返回一个与cov
同大小的对角矩阵S，两个正交矩阵U 和V，且满足= U*S*V'。若A 为m×n 阵，则U 为m×m 阵，V为n×n 阵。奇异值在S
的对角线上，非负且按降序排列。 %使用SVD可以对非方阵进行PCA处理 E = diag(S); contribution =
cumsum(E)./sum(E);%计算贡献率 fprintf('compute cov done.\n'); %降维矩阵U中的特征向量，
在关于人脸的降维中，又被称为特征脸， U 中的每个特征向量相当于找到的降维空间的一个方向。利用U可以将特征映射到这个空间中。 %% 显示特征脸 U的前十项
for i = 1:10 ef = U(:, i); img = ef; minVal = min(img); img = img - minVal;
max_val = max(abs(img)); img = img / max_val; img = reshape(img, 50, 40);
imwrite(img, strcat('C:\Users\zyfls\Desktop\ML各种截图\5\Recognition\','eigenface',
int2str(i), '.bmp')); end fprintf('eigen face saved, paused.\n'); pause; j=1;
regis = zeros(120,50*40); for i = 1:14: img_num %取每组人的第一张照片做登记记录集 image_name =
img_path_list(i).name;% 图像名 % name = image_name(1:3); % if strcmp(name,'001')
img = imread(strcat(file_path,image_name)); img = double(img); regis(j, :) =
img(:); j=j+1; % end end regis = bsxfun(@minus,regis,mu); Uk = U(:, 1:100);
%取从1到100的特征向量作为降维空间 Zregis = regis * Uk;%记录登记记录集在降维阵中的值 %% dimension reduction
k = 100; % reduce to 100 dimension test = zeros(1680, 50 * 40); file_path =
'C:\Users\zyfls\Desktop\ML\第五章数据降维\数据\AR\AR\';% 图像文件夹路径 img_path_list =
dir(strcat(file_path,'*.bmp'));%获取该文件夹中所有bmp格式的图像 success = 0; for i = 1:
img_num %遍历每张照片将其与登记记录比较，进行分类 image_name = img_path_list(i).name;% 图像名 img =
imread(strcat(file_path,image_name)); %img =
imread(strcat('C:\Users\zyfls\Desktop\ML各种截图\5\', int2str(i), '.bmp')); img =
double(img); test(i, :) = img(:); test(i,:) = test(i,:) - mu; Uk = U(:, 1:100);
%取从1到dimsion的特征向量作为降维空间 Ztest = test * Uk;%测试集在降维阵中的值 for j=1:120
mdist(j)=norm(Ztest(i,:)-Zregis(j,:));%计算与登记记录的距离 end [C,I] =
min(mdist);%返回最小的距离，及其位置 if(I<10) I = num2str(I); I = strcat('00',I); elseif
(I<100) I = num2str(I); I = strcat('0',I); else I = num2str(I); end name =
image_name(1:3);%取当前图片的前三位用于判断是否分类正确 if strcmp(name,I) success = success + 1;
end end suc_rate = success/1680;
PS：我的图片库名字格式是前三位数字编号代表不同人，所以这里靠这个来辨认分类的正确性。

这样就完成了整个PCA的降维、重构和识别，终于完成大作业了。

下一篇会介绍一下SR（稀疏字典）识别。

<http://blog.csdn.net/watkinsong/article/details/38536463>

<http://blog.csdn.net/watkinsong/article/details/38536463>

<http://blog.csdn.net/watkinsong/article/details/38536463>

热门工具换一换