Preface:
Let's be honest first, In the beginning, the learning of deep neural network caused me a lot of trouble, I also watch relevant videos constantly, Literature explanation tries to understand memory. After all, most of these contents are not available, All we see is input and output, Internal operation and working principle, It's all about meditation.


This articleCNN The principle of convolution neural network, I also read the materials I collected, Understand, After practice, Only when I have some opinions can I write well, I'll try to be as detailed as possible, Not clear, I hope you can point out.
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-

One, How to read a map by a machine


Let's start with a brain teaser: Draw a giant panda on the white paper, There are several colors of brushes needed?—— Everyone should know, Just a black brush, Just paint the black part of the giant panda black, An image of a giant panda can be displayed.


The way we draw pandas, Actually, it's very close to mother's cross stitch—— In a given grid, Embroider with different colors, In the end, a specific“ picture”. The way of machine drawing is just the opposite of cross stitch, Now we have a picture, The machine recognizes each grid in the picture( Pixel points) Upper color, Store the colors in each cell with a number type, Get a very large number matrix, The picture information is also stored in this digital matrix.


Each grid in the figure above represents a pixel, Numbers in pixels represent color codes, Color code range is[0,255],( All kinds of colors are made of red, green, Blue tricolor composition, Every color is0~255 Number between)
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
On the basis of a large number matrix, we carry out the work of convolution neural network identification:

The process of machine map recognition: Machine recognition image is not a complete recognition of a complex image all at once, Instead, a complete picture is divided into many small parts, Extract the features in each small part( That is to identify each small part), And then we can summarize the characteristics of these small parts, Then we can complete the process of image recognition
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-

Two, The principle of convolution neural network

useCNN Image recognition by convolutional neural network, The general steps are:

* Preliminary feature extraction of convolution layer
* Main features of pool layer extraction
* The full connection layer summarizes the characteristics of each part
* Generate classifier, Carry out prediction identification
1, Working principle of convolution layer

The function of convolution layer: It is to extract the features in each small part of the picture

Suppose we have a dimension of6*6 Image, Every pixel stores the information of the image. Let's define another convolution kernel
( Equivalent to weight), Used to extract certain features from images. Convolution kernel and corresponding bits of digital matrix multiply and add, Get convolution layer output results.

(429 = 18*1+54*0+51*1+55*0+121*1+75*0+35*1+24*0+204*1)
The value of convolution kernel without previous learning experience, Can be randomly generated by function, Step by step training adjustment

When all pixels are overwritten at least once, You can generate an output of a convolution layer( The steps in the following figure are1)


At first, the machine didn't know what features the part to be identified had, Through the convolution kernel
Output value from phase action, Compare with each other to determine which convolution kernel best represents the characteristics of the image—— For example, we need to recognize some features in the image( Such as curves), In other words, The convolution kernel should have a high output value for this kind of curve, For other shapes( Like triangles) Low output.
Higher output value of convolution layer, It means that the higher the matching degree, The better to show the characteristics of the picture.

Specific working process of convolution layer:
For example, we designed a convolution kernel as follows, The curve you want to recognize is shown on the right:


Now let's use the convolution kernel above, To identify this simplified image—— A cartoon mouse


When the machine recognizes the rat's ass, Convolution kernel and real area digital matrix, Larger output:6600


Using the same convolution kernel, To recognize the mouse's ears, Output is very small:0


We can think that: The existing convolution kernel preserves the characteristics of the curve, The match identified that the mouse's butt was curved. We need convolution kernels of other features, To match the other parts of the identified mouse.
The function of convolution layer is to change convolution kernel constantly, To determine which convolution kernels are useful for characterizing images, Then the output matrix multiplied by the corresponding convolution kernel is obtained

2, Working principle of pool layer

The input of the pooling layer is the output matrix after multiplying the original data output by the convolution layer and the corresponding convolution kernel
Purpose of pool layer:

* To reduce the number of training parameters, Reduce the dimension of eigenvector output by convolution layer
* Reduce over fitting, Keep only the most useful picture information, Reduce the transmission of noise
The two most common forms of pool layer:

* Maximum pooling:max-pooling—— Select the largest number in the specified area to represent the whole area
* Mean pooling:mean-pooling—— Select the average value of the value in the specified area to represent the whole area
Two ways of pooling:( Pool step is2, Selected areas, Don't pick next time)

stay4*4 In the digital matrix of, With step length2*2 Select region, For example, the upper left general area[1,2,3,4] Maximum value in4 Pooled output; Upper right area[1,2,3,4] Median mean5/2 Pooled output

3, Working principle of full connection layer

The work of convolution layer and pooling layer is to extract features, And reduce the parameters brought by the original image. however, To generate the final output, We need to apply the full connection layer to generate a classifier equal to the number of classes we need.


The working principle of the full connection layer is similar to the previous neural network learning, We need to re cut the tensor output from the pooling layer into some vectors, Multiply the weight matrix, Plus offset, Then use itReLU Activation function, Using gradient descent method to optimize parameters.

Three, Code analysis of convolutional neural network

1, Data set reading, And data predefined
from tensorflow.examples.tutorials.mnist import input_data # readMNIST data set mnist =
input_data.read_data_sets('MNIST_data', one_hot=True) # Predefined input valuesX, Output true valueY
placeholder Placeholder x = tf.placeholder(tf.float32, shape=[None, 784]) y_ =
tf.placeholder(tf.float32, shape=[None, 10]) keep_prob =
tf.placeholder(tf.float32) x_image = tf.reshape(x, [-1,28,28,1])
* MNIST yesGoogle A classic data set for image recognition, Picture size is28*28 Of, Download before use.MNIST Dataset download link:
https://pan.baidu.com/s/1d9ty82 <https://pan.baidu.com/s/1d9ty82> Password: jcam
* x,y_ Now they're all represented by placeholders, When the program runs to a certain instruction, towardsx,y_ After passing in a specific value, You can substitute it for calculation
* shape=[None,
784] Is the data dimension size—— becauseMNIST The size of each picture in the dataset is28*28 Of, When calculating, it will28*28 2-D data is transformed into a 1-D, Count Reg784 New vectors.None Indicates that its value size is variable, Means selectedx,y_ The number of
* keep_prob Is to change the number of neurons involved in the calculation.( Detailed description below)
2, weight, Offset value function
def weight_variable(shape): # Generate random variables initial = tf.truncated_normal(shape,
stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial =
tf.constant(0.1, shape=shape) return tf.Variable(initial)
truncated_normal() function: Select the mean value of normal distribution=0.1 Nearby random value

3, Convolution function, Definition of pooling function
def conv2d(x, W): #stride = [1, Horizontal movement step, Vertical movement step,1] return tf.nn.conv2d(x, W,
strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): # stride =
[1, Horizontal movement step, Vertical movement step,1] return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2,
2, 1], padding='SAME')
* inputx Is the picture information matrix,W Is the value of the convolution kernel
* Convolution layerconv2d() Function insidestrides Parameter requires first, The last parameter must be1;
* Second parameter representation: Step size of convolution kernel moving to the right every time
* Third parameter representation: The step length of convolution kernel moving down each time
In the working principle of the convolution layer above, Exhibitstrides=[1, 1, 1, 1] Dynamic graph,
Below showstrides=[1, 2, 2, 1] Circumstances: You can see that the highlighted area moves two spaces to the right at a time, Move down two spaces


Can get: When the step size of our convolution layer is larger, The smaller the size of the output image you get. In order to keep the size of the image as large as the original image, Fill enough around the input image 0
Boundary can solve this problem, Thenpadding The parameter of is“SAME”( Use boundaries to keep more information, It also retains the original size of the image) Figure below:


padding Another optional parameter for is“VALID”, and“SAME” The difference is: No need0 To fill the boundary, At this time, the size of the image will be smaller than the original image. New image size size =
Original data size- Convolution kernel size+1( Generally, we choosepadding All for“SAME”)

The pooling function uses simple traditional2x2 Size of templatemax pooling, Pool step is2, The selected area will not be selected next time

4, First convolution+ Pooling
x_image = tf.reshape(x, [-1,28,28,1]) # Convolution layer1 Definition of network structure # Convolution kernel1:patch=5×5;in size
1;out size 32; Activation functionreLU Nonlinear processing W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 =
bias_variable([32])# output size 28*28*32 h_conv1 = tf.nn.relu(conv2d(x_image,
W_conv1) + b_conv1)# output size 14*14*32 h_pool1 = max_pool_2x2(h_conv1)
* The picture set is monochrome,x_image Last image size parameter in = 1, colour = 3
* The convolution kernel size here is5*5 Of, The number of channels entered is1, The number of output channels is32
* The value of convolution kernel here is equivalent to the weight value, Obtained by the way of generating random number sequence
*
BecauseMNIST The dataset image size is28*28, And black and white monochrome, So the exact image size is28*28*1(1 Indicates that the picture has only one color layer, Color pictures are all3 Color layer——RGB), So after the first convolution, The number of output channels is determined by1 Become32, Picture size changed to:28*28*32( It's like stretching high)
* After the first pooling( Pooling step is2), Picture size is14*14*32
5, Second convolution+ Pooling
# Convolution layer2 Definition of network structure # Convolution kernel2:patch=5×5;in size 32;out size 64; Activation functionreLU Nonlinear processing W_conv2 =
weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) # output size
14*14*64 h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) # output size
7 *7 *64 h_pool2 = max_pool_2x2(h_conv2)
* The convolution kernel size here is also5*5 Of, The number of channels for the second input is32, The number of output channels is64
* First convolution+ The size of the pooled output image is14*14*32, After the second convolution, the image size becomes:14*14*64
* After the second pooling( Pooling step is2), The final output image size is7*7*64
6, Fully connected layer1, Fully connected layer2
# Fully connected layer1 W_fc1 = weight_variable([7*7*64,1024]) b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64]) h_fc1 =
tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)h_fc1_drop =
tf.nn.dropout(h_fc1, keep_prob)# Fully connected layer2 W_fc2 = weight_variable([1024, 10]) b_fc2
= bias_variable([10])prediction = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
* The input of the full connection layer is the output after the second pooling, Size is7*7*64, Fully connected layer1 Yes1024 Neurons
* tf.reshape(a,newshape) function, Whennewshape = -1 Time, Function will calculate the other values of the array according to the existing dimensionsshape Attribute value
* keep_prob To reduce over fitting. Only some neurons are involved in the work to adjust the weight. Only whenkeep_prob = 1 Time, It's all neurons that work
* Fully connected layer2 Yes10 Neurons, Equivalent to the generated classifier
* Through full connection layer1,2, Store the predicted valueprediction in
7, Optimization of gradient descent method, Accuracy rate
# Quadratic cost function: Error between predicted value and real value loss = tf.reduce_mean(tf.nn.softmax
_cross_entropy_with_logits(labels=y_, logits=prediction))
# Gradient descent method: Too much data, SelectionAdamOptimizer Optimizer train_step = tf.train.AdamOptimizer(1e-4)
.minimize(loss) # Results are stored in a Boolean list correct_prediction = tf.equal(tf.argmax
(prediction,1), tf.argmax(y_,1)) # Accuracy rate accuracy = tf.reduce_mean(tf.cast
(correct_prediction, tf.float32))
* Because the data set is too large, The optimizer used here isAdamOptimizer, The learning rate is1e-4
* tf.argmax(prediction,1) What is returned is for any inputx Predicted label value,tf.argmax(y_,1) Represents the correct label value
* correct_prediction
Here is the return of a Boolean array. In order to calculate the accuracy of our classification, We convert Boolean values to floating-point numbers to represent right and wrong, Then take the average. for example:[True, False, True,
True] Turn into[1,0,1,1], The accuracy is0.75
8, Other instructions, Save parameters
for i in range(1000): batch = mnist.train.next_batch(50) if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob:
1.0}) print("step",i, "training accuracy",train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) '''
# Save model parameters saver.save(sess, './model.ckpt') print("test accuracy
%g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels,
keep_prob: 1.0})) '''
* batch Is derived fromMNIST data set, A batch contains50 Bar data
* feed_dict=({x: batch[0], y_: batch[1], keep_prob:
0.5} Sentence: Will bebatch[0],batch[1] Value passed in forx,y_;
* keep_prob = 0.5 Only half of the neurons work
When training is complete, The program will save the learned parameters, No need to train next time
Special reminder: Running takes up a lot of memory, And run to the last time you save the parameters, The computer may be stuck

Four, Source code and effect display
# -*- coding:utf-8 -*- # -*- author:zzZ_CMing # -*- 2018/01/24;14:14 # -*-
python3.5 from tensorflow.examples.tutorials.mnist import input_data import
tensorflowas tf import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' def
weight_variable(shape): # Generate random variables # truncated_normal: Select the mean value of normal distribution=0.1 Nearby random value
initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) def
bias_variable(shape): initial = tf.constant(0.1, shape=shape) return
tf.Variable(initial)def conv2d(x, W): #stride = [1, Horizontal movement step, Vertical movement step,1] return
tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): #
stride = [1, Horizontal movement step, Vertical movement step,1] return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME') # readMNIST data set mnist =
input_data.read_data_sets('MNIST_data', one_hot=True) sess =
tf.InteractiveSession()# Predefined input valuesX, Output true valueY placeholder Placeholder x =
tf.placeholder(tf.float32, shape=[None, 784]) y_ = tf.placeholder(tf.float32,
shape=[None, 10]) keep_prob = tf.placeholder(tf.float32) x_image =
tf.reshape(x, [-1,28,28,1]) #print(x_image.shape) #[n_samples,28,28,1]
# Convolution layer1 Definition of network structure # Convolution kernel1:patch=5×5;in size 1;out size 32; Activation functionreLU Nonlinear processing W_conv1 =
weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) h_conv1 =
tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)#output size 28*28*32 h_pool1 =
max_pool_2x2(h_conv1)#output size 14*14*32 # Convolution layer2 Definition of network structure # Convolution kernel2:patch=5×5;in size
32;out size 64; Activation functionreLU Nonlinear processing W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2
= bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
#output size 14*14*64 h_pool2 = max_pool_2x2(h_conv2) #output size 7 *7 *64 #
Fully connected layer1 W_fc1 = weight_variable([7*7*64,1024]) b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])
#[n_samples,7,7,64]->>[n_samples,7*7*64] h_fc1 =
tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) h_fc1_drop =
tf.nn.dropout(h_fc1, keep_prob)# Reduce the amount of calculationdropout # Fully connected layer2 W_fc2 = weight_variable([
1024, 10]) b_fc2 = bias_variable([10]) prediction = tf.matmul(h_fc1_drop,
W_fc2) + b_fc2#prediction = tf.nn.softmax(stf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# Quadratic cost function: Predicted value and real value的误差 loss =
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_,
logits=prediction))#梯度下降法:数据太庞大,选用AdamOptimizer优化器 train_step =
tf.train.AdamOptimizer(1e-4).minimize(loss) #结果存放在一个布尔型列表中 correct_prediction =
tf.equal(tf.argmax(prediction,1), tf.argmax(y_,1)) #求准确率 accuracy =
tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) saver = tf.train.Saver()
# defaults to saving all variables sess.run(tf.global_variables_initializer())
for i in range(1000): batch = mnist.train.next_batch(50) if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob:
1.0}) print("step",i, "training accuracy",train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) '''
#保存模型参数 saver.save(sess, './model.ckpt') print("test accuracy
%g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels,
keep_prob: 1.0})) '''
效果展示如下:

训练700次时候,成功率已经到达98%,越往后学习,准确率越高

特别提醒:由于我的电脑配置比较低,运行耗时较长,而且在保存参数时候还会出现卡死情况,大家请注意.
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
结束语:自己也是通过学习前辈们的讲解,自己慢慢摸索的,学习就是自我填坑的过程,希望我们都能坚持下来,也希望这篇能帮到你,朋友.

系列推荐:

【监督学习】1:KNN算法实现手写数字识别的三种方法
<https://blog.csdn.net/zzz_cming/article/details/78938107>
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
【无监督学习】1:K-means算法原理介绍,以及代码实现
<https://blog.csdn.net/zzz_cming/article/details/79859490>
【无监督学习】2:DBSCAN算法原理介绍,以及代码实现
<https://blog.csdn.net/zzz_cming/article/details/79863036>
【无监督学习】3:Density Peaks聚类算法(局部密度聚类)
<https://blog.csdn.net/zzz_cming/article/details/79889909>
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
【深度学习】1:感知器原理,以及多层感知器解决异或问题
<https://blog.csdn.net/zzz_cming/article/details/79031869>
【深度学习】2:BP神经网络的原理,以及异或问题的解决
<https://blog.csdn.net/zzz_cming/article/details/79118894>
【深度学习】3:BP神经网络识别MNIST数据集
<https://blog.csdn.net/zzz_cming/article/details/79136928>
【深度学习】4:BP神经网络+sklearn实现数字识别
<https://blog.csdn.net/zzz_cming/article/details/79182103>
【深度学习】5:CNN卷积神经网络原理,MNIST数据集识别
<https://blog.csdn.net/zzz_cming/article/details/79192815>
【深度学习】8:CNN卷积神经网络识别sklearn数据集(附源码)
<https://blog.csdn.net/zzz_cming/article/details/79691459>
【深度学习】6:RNN递归神经网络原理,MNIST数据集识别
<https://blog.csdn.net/zzz_cming/article/details/79235475>
【深度学习】7:Hopfield神经网络(DHNN)原理介绍
<https://blog.csdn.net/zzz_cming/article/details/79289502>
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-
TensorFlow框架简单介绍 <https://blog.csdn.net/zzz_cming/article/details/79235469>
–—-—-—-—-—-—-—-—-—-—-—-—–—-—-—-—-—-—-—-——-—-—-—-—-—-—-—-—-—-—-—-—-—-——-