Playing the seven levels of image with deep learning



Xu tie- Cruiser Technology <https://www.zhihu.com/people/hun-dun-xun-yang-jian>2 Days ago

First realm: image recognition

 

If you start to understand the image processing of deep learning, The first task you come into contact with must be image recognition :

For example, input your love cat into an ordinaryCNN In the network, See if it's a cat or a dog.



 



One of the most commonCNN, For example, several layers like thisCNN OriginatorLenet,
If you have a good dataset( such askaggle Cats and dogs) Can give a classification result that is not satisfactory(80% Multiple accuracy rate), It's not too high.

 

Of course, If you add some knowledge of a particular problem, You can also identify your face by the way, Open astartup callface What to reduce:



Can play, You can also identify a pig's face( I think it looks the same), So we can figure out the identity of every pig, For the sale of high quality pork, It's really helpful.



Or we can see what kind of diseases the plants have, Different spots like this, People are too lazy to watch, It can show you. Plant protectors can go to the fields with their cell phones.



Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional
networks for biomedical image segmentation." International Conference on
Medical Image Computing and Computer-Assisted Intervention. Springer, Cham,
2015.

 

Although plant protection is really good, It's really boring to do classification.


The direction of our evolution, That is to say, using higher network structure to get better accuracy, For example, a residual network like the figure below( It can be reached on the cat and dog data set99.5% Above accuracy). If you do well in classification, you will become a master of deep learning, With an axe, there's a nail in the glasses.
Why classification is simple, One is due to a large number of tagged images, Second, classification is a very clear boundary problem, Even if the machine doesn't know what a cat is and what a dog is, It's easy to see the difference,
If you give machines tens of thousands of categories to distinguish, The ability of the machine to pass is reduced( Complex network, stayimagenet That way1000 In a class of questions, It's hard to exceed80% Accuracy rate).



He, Kaiming, et al. "Identity mappings in deep residual networks." European
Conference on Computer Vision. Springer International Publishing, 2016.

 

Second realm : Object detection

 

Soon you'll find out, Classification skills don't work in most real life situations. Because of the task in reality, This is often the case:



Or something like that:



So many things together, You take the cat and dog's big head to train the classification network to be disordered in a flash. Even if you have a cat and a dog in a picture, Even make a little noise for the cat, Can make your classification network disorderly.

In reality, How can there be so many pictures, A picture is a big picture of a cat or a beautiful woman, More time, Something in a picture, That's a lot, Messy, There are no rules,
You need to make your own box, Frame what you need to see, Then? Look what these things are .

So you come to the next level of machine vision challenge - object detection( Frame the target object from the large image and identify it), What follows is a new network architecture, Also known asR - CNN, Picture detection network ,
This network can not only tell you the classification, It can also tell you the coordinates of the target object, Even if there are many objects in the picture, I'll find it for you one by one.

 



Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with
region proposal networks." Advances in neural information processing systems.
2015.

 

Ten thousand soldiers cut your head. That's a bar, Identify suspects in a large number of passers-by, It's easy, The security guard can't help listening.

This yearYOLO The algorithm realizes fast and real-time object detection, You walk by and tell you what's in sight, Know how powerful this is in driverless.

 



YOLO Rapid detection methodRedmon, Joseph, et al. "You only look once: Unified, real-time object
detection." Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2016.

 

Of course, You'll still end up bored here, Even if the network can already be complex, It's just aCNN network( Recommended area), Adding a layerCNN Network classification and regression. Could you do something else?

 

The third realm : Image cutting

Ah, This brings us to the third level, Not only do you need to detect the objects in the corners of the picture, You have to do such a job, Just take it out of the picture. Need to know,
Newborn babies can't tell the boundaries of objects, Like apples on the table, What is a table, What is an apple, Why don't apples occupy the table? therefore, Can the network extract objects from a graph,
It's about whether it really grasps the essence of vision like a human being. It's kind of about it“ Turing test” . And simplify it, We just create an original picture on the original picture“mask”,
Mask, Kind of likephtoshop The mask in.



Matting



Drozdzal, Michal, et al. "The importance of skip connections in biomedical
image segmentation." International Workshop on Large-Scale Annotation of
Biomedical Data and Expert Label Synthesis. Springer International Publishing,
2016.

Be careful, In this mission, We want to get another picture from one picture! The generated mask is another image, This time, So-calledU The appearance of network, Notice that this is our first generative model.
Its component unit is still convolution, But I didmaxpooling Inverse process dimension increasing sampling based on.

 

thisSegmentation task, It can't be ignored, Especially for you, Now, for example, private satellites and drones are popular, Do you want to see the landscape around your community,
See if there's a vault hidden? Clear input, No more than one column of satellite pictures. Where is the tree? Where is water? Where is the military base, No need for people, I'll dig it out for you.

 



 

If you want a few cells or something , It's easy, Give it that shape, and you'll do it.



 

The fourth realm:

Let's start.fashion get up, If you are the owner of Taobao clothing store , Want the customer to enter a picture of the garment, Then get a set of recommended clothes, What about the function of searching for pictures?
Attention! I can get a bunch of pictures out of the Internet, But these data are not labeled. How to do it?? Brother tie accused you of something, This practice, Clustering..

Tiege taught you the simplest way to cluster, That is, Put all pictures into convolution network, But we don't extract categories, Only extract some features of the network middle layer,
These features are a bit like the visual QR code of each picture, And then we make one for these QR codesk-means clustering, It will also have unexpected effects. Why depth?
Because of the features of depth extraction, That's different.

And then search for it? But find other pictures in the same cluster.



On the basis of clustering, You can do a search!



 

The fifth level :

We began to rise to Stargazing, The applications that used to make money by classification were boring. How to make science with machine vision?
As a group of people looking up at the stars and observing cells, What we often find is that the noise of astronomical or cell images we get is too much, I can't bear it, Then?
Deep learning gives you a way to reduce noise and restore images. One is calledauto-encoder Tools, It has played a great role , Brush it. The image is clear.

 



It's not the coolest, The antagonistic learning with game theory, It can also help you murder noise! If you will fight against the so-calledGAN, It's also a tool for image generation,
Pictures that let the network get rid of noise, And natural pictures without noise, Not even convolution network, Yes, This is it!





Schawinski, Kevin, et al. "Generative adversarial networks recover features in
astrophysical images of galaxies beyond the deconvolution limit." Monthly
Notices of the Royal Astronomical Society: Letters 467.1 (2017): L110-L114.

 

 

The sixth realm :

 

Enough money in industry, Science toonerd 了, Let's play art, think Philosophy , First move, Image style migration, See Tiege's previous article
<https://zhuanlan.zhihu.com/p/31404314>:





 



 

But it's really good, It's the confrontation learning just mentionedGAN, For example, famousCycleGAN, You can almost implement a custom“ Image translation”
function, And you don't have to mark it, Take out two groups of pictures in winter and summer, It will automatically find the corresponding between two groups of pictures.

 



Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using
cycle-consistent adversarial networks." arXiv preprint arXiv:1703.10593 (2017).

 

The seventh realm:

 

Image translation is also lazy, Your neural network doesn't claim to be able to understand images, See you make a living out of nothing, Generate pictures in the noise?

 

Yes, StillGAN, And it's the most basic convolutionGAN (DCGAN) I can do it for you.

Have a lookGAN Fantasized hotel scene, Can you think of a computer drawing? Ha ha ha!



Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural
information processing systems. 2014.



 

Write here, I feel thatGAN It's very promising, Promising, Promising, I used to think it was just fun.

The seven level butcher on display here, But also deep learning by humandiscover The tip of the iceberg, Don't laugh when you are drunk on the battlefield, How many people have fought in ancient times.