Application of convolution neural network in computer vision （ Two ）
Application of computer vision network to convolution （ Two ）
Before we turn on the second section of the convolutional neural network , Let's review these points first , What is the basis of the number of convolution kernel channels in convolution neural networks , What determines the number of convolution kernels in convolutional neural networks , If we haven't made clear these knowledge points , We can learn more about it .
A typical layer in convolutional networks contains three levels . In the first level , This layer computes multiple convolutions in parallel to produce a set of linear activation responses . At the second level
in , Each linear activation corresponds to a nonlinear activation function , For example, rectifying the linear activation function . This stage is sometimes referred to as the detection stage . In the third level , We use pooling functions to further adjust the output of this layer .
A typical layer in convolutional networks contains three levels . In the first level , This one Layer parallel computing multiple convolutions produces a set of linear activation responses . In the second level , Each linear activation corresponds to a nonlinear activation function , example
Such as rectifying linear activation function . This stage is sometimes referred to as the detection stage . In the third level , We use pooling functions to further adjust the output of this layer .
Pooling functions use the overall characteristics of adjacent outputs at a location to replace the output of the network at that location . For example, the maximum pooling function gives the adjacent region Internal
Maximum . Other commonly used pooling functions include the average values in adjacent regions , Weighted average function of local central pixel . Let's go through one Let's take a small example of maximum pooling
As shown in the figure above , We have one 4*4 Original input image of , We have the window size of 2*2 The stride is 2 Maximum pooling operation for , As shown above
Show us that we have 4 Output of colors , Each represents the largest value extracted from different color regions to represent this region , The formula and convolution we use
The operation is the same , One n*n Original input image of , The window size is f*f, The stride is s, Then the output image should be （（n-f）/s+1）*
（（n-f）/s+1）, Different from convolution kernel, all values in convolution kernel are hyperparameters , There are no parameters in the pooling operation , We're just right The corresponding position is used for feature extraction , It does not involve the setting of parameters .
At present, we know both convolution and pooling , So it's not hard to find a problem . That's with convolution
With the pooling operation, the size of our image will become smaller and smaller , This is a fatal problem for deep learning , Because of deep learning
The depth of the regular network is hundreds of layers , In each layer, we have to do convolution and pooling operations , Well, after a couple of layers, our images will
become 1*1 It's the size of , This is obviously not what we want . For this problem, we introduce filling （padding） To solve it . Here I am We introduce the convolution network in common use same
padding, As the name suggests, we used it same padding After the convolution operation, the image size
It remains the same . As shown in the figure below , We give a primitive one 6*6 There are a lot of additions to the outside of the image 0 Pixels , Make it a 8*8 Graph of
image , And then use it 3*3 The convolution check is used for convolution operation , According to the formula we've learned , The final output image size should be 6*6,
In the convolution operation, the size of the image remains unchanged , The convolution operation can be used in deep network . Finally, let's look at the formula
Promotion ： We have another one n*n Original image of , The filling layer is p, The size of convolution kernel is f*f, The step size of each move is s, So we lose
The size of the image is （（n+2p-f）/s+1）*（（n+2p-f）/s+1）.
Four Handwriting Recognition
With the above knowledge , Let's learn how to recognize a handwritten image using convolutional neural networks . As shown in the figure below ：
The original picture is 32*32*3 Color handwritten pictures for , Go first 6 individual 5*5*3 Convolution operation of convolution kernel , Then proceed 2*2 The stride is 2 Maximum pooling operation for , So far, the first layer is completed and the size is 14*14*6 Image of . The output of the first layer is the input of the second layer , We went on to the second floor 16 individual 5*5*6 Convolution operation of convolution kernel , Then proceed 2*2 The stride is 2 Maximum pooling operation for , The final output image is 5*5*16 Size of . We then convert it to a one-dimensional array , The size is 5*5*16=400 individual , Then make full connection operation （ In the following sections, we will explain the full join operation ）, After two layers of full join, we get an output , add softmax Function will be generated 0-9 The probability of ten numbers , We chose the maximum probability as our result . At this point, a complete experiment of handwriting recognition using convolutional neural network is completed .