Playing the seven levels of image with deep learning



Xu tie - Cruiser Technology <https://www.zhihu.com/people/hun-dun-xun-yang-jian>2 Days ago

First realm : image recognition

 

If you start to understand the image processing of deep learning , The first task you come into contact with must be image recognition :

For example, input your love cat into an ordinary CNN On the Internet , See if it's a cat or a dog .



 



One of the most common CNN, For example, several layers like this CNN Originator Lenet,
If you have a good dataset ( such as kaggle Cat dog battle ) Can give a classification result that is not satisfactory (80% Multiple accuracy ), It's not too high .

 

of course , If you add some knowledge of a particular problem , You can also identify your face by the way , Open a startup call face What to subtract :



Will play , You can also identify a pig's face ( I think it looks the same ), So we can figure out the identity of every pig , For the sale of high quality pork , It's really helpful .



Or to see what kind of diseases the plants have , Different spots like this , People are too lazy to watch , It can show you . Plant protectors can go to the fields with their cell phones .



Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional
networks for biomedical image segmentation." International Conference on
Medical Image Computing and Computer-Assisted Intervention. Springer, Cham,
2015.

 

Although plant protection is really good , It's really boring to do classification .


The direction of our evolution , That is to say, using higher network structure to get better accuracy , For example, a residual network like the figure below ( It can be reached on the cat and dog data set 99.5% Above accuracy ). If you do well in classification, you will become a master of deep learning , With an axe, there's a nail in the glasses .
Why classification is simple , One is due to a large number of tagged images , Second, classification is a very clear boundary problem , Even if the machine doesn't know what a cat is and what a dog is , It's easy to see the difference ,
If you give machines tens of thousands of categories to distinguish , The ability of the machine to pass is reduced ( Complex network , stay imagenet That way 1000 In a class of questions , It's hard to exceed 80% Accuracy of ).



He, Kaiming, et al. "Identity mappings in deep residual networks." European
Conference on Computer Vision. Springer International Publishing, 2016.

 

Second realm : Object detection

 

Soon you'll find out , Classification skills don't work in most real life situations . Because of the task in reality , This is often the case :



Or something like that :



So many things together , You take the cat and dog's big head to train the classification network to be disordered in a flash . Even if you have a cat and a dog in a picture , Even make a little noise for the cat , Can make your classification network disorderly .

In reality , How can there be so many pictures , A picture is a big picture of a cat or a beautiful woman , More time , Something in a picture , That's a lot , Chaotic , There are no rules ,
You need to make your own box , Frame what you need to see , then , Look what these things are .

So you come to the next level of machine vision challenge - object detection ( Frame the target object from the large image and identify it ), What follows is a new network architecture , Also known as R - CNN, Picture detection network ,
This network can not only tell you the classification , It can also tell you the coordinates of the target object , Even if there are many objects in the picture , I'll find it for you one by one .

 



Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with
region proposal networks." Advances in neural information processing systems.
2015.

 

Ten thousand soldiers cut your head. That's a bar , Identify suspects in a large number of passers-by , It's easy , The security guard can't help listening .

This year YOLO The algorithm realizes fast and real-time object detection , You walk by and tell you what's in sight , Know how powerful this is in driverless .

 



YOLO Rapid detection method Redmon, Joseph, et al. "You only look once: Unified, real-time object
detection." Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. 2016.

 

of course , You'll still end up bored here , Even if the network can already be complex , It's just a CNN network ( Recommended areas ), Adding a layer CNN Network classification and regression . Could you do something else ?

 

The third realm : Image cutting

ah , This brings us to the third level , You don't just need to detect the corners of the image , You have to do such a job , Just take it out of the picture . You know ,
Newborn babies can't tell the boundaries of objects , Like apples on the table , What is a table , What is an apple , Why don't apples occupy the table ? therefore , Can the network extract objects from a graph ,
It's about whether it really grasps the essence of vision like a human being . It's kind of about it “ Turing test ” . And simplify it , We just create an original picture on the original picture “mask”,
Mask , It's kind of like phtoshop The mask in .



So called matting



Drozdzal, Michal, et al. "The importance of skip connections in biomedical
image segmentation." International Workshop on Large-Scale Annotation of
Biomedical Data and Expert Label Synthesis. Springer International Publishing,
2016.

be careful , In this mission , We want to get another picture from one picture ! The generated mask is another image , At this time , So called U The appearance of network , Notice that this is our first generative model .
Its component unit is still convolution , But I did maxpooling Inverse process dimension increasing sampling based on .

 

this Segmentation task , It can't be ignored , Especially for you , Now, for example, private satellites and drones are popular , Do you want to see the landscape around your community ,
See if there's a vault hidden ? Clear input , No more than one column of satellite pictures . Where are the trees , Where is the water , Where is the military base , No need for people , I'll dig it out for you .

 



 

If you want a few cells or something , It's easy , Give it that shape, and you'll do it .



 

The fourth realm :

Let's start fashion get up , If you are the owner of Taobao clothing store , Want the customer to enter a picture of the garment , Then get a set of recommended clothes , What about the function of searching for pictures ?
Pay attention , I can get a bunch of pictures out of the Internet , But these data are not labeled . What should I do? ? Brother tie accused you of something , This method , It's clustering .

Tiege teaches you the easiest way to cluster , That is , Put all pictures into convolution network , But we don't extract categories , Only extract some features of the network middle layer ,
These features are a bit like the visual QR code of each picture , And then we make one for these QR codes k-means clustering , It will also have unexpected effects . Why depth ?
Because of the features of depth extraction , That's different .

And then search for it ? But find other pictures in the same cluster .



On the basis of clustering , You can do a search !



 

The fifth level :

We began to rise to Stargazing , The applications that used to make money by classification were boring . How to make science with machine vision ?
As a group of people looking up at the stars and observing cells , What we often find is that the noise of astronomical or cell images we get is too much , I can't bear it , then ,
Deep learning gives you a way to reduce noise and restore images . One is called auto-encoder Tools for , It has played a great role , A brush , The image is clear .

 



It's not the coolest , The antagonistic learning with game theory , It can also help you murder noise ! If you will fight against the so-called GAN, It's also a tool for image generation ,
Pictures that let the network get rid of noise , And natural pictures without noise , Not even convolution network , Yes , this is it !





Schawinski, Kevin, et al. "Generative adversarial networks recover features in
astrophysical images of galaxies beyond the deconvolution limit." Monthly
Notices of the Royal Astronomical Society: Letters 467.1 (2017): L110-L114.

 

 

The sixth realm :

 

Enough money in industry , Science is too nerd 了 , Let's play art, think Philosophy , First move , Image style migration , See Tiege's previous article
<https://zhuanlan.zhihu.com/p/31404314>:





 



 

But it's really good , It's the confrontation learning just mentioned GAN, For example, famous CycleGAN, You can almost implement a custom “ Image translation ”
function , And you don't have to mark it , Take out two groups of pictures in winter and summer , It will automatically find the corresponding between two groups of pictures .

 



Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using
cycle-consistent adversarial networks." arXiv preprint arXiv:1703.10593 (2017).

 

The seventh realm :

 

Image translation is also lazy , Your neural network doesn't claim to be able to understand images , See you make a living out of nothing , Generate pictures in the noise ?

 

Yes , Still GAN, And it's the most basic convolution GAN (DCGAN) I can do it for you .

have a look GAN Fantasized hotel scene , Can you think of a computer drawing ? Ha ha ha !



Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural
information processing systems. 2014.



 

Write here , I feel that GAN It's very promising , Promising , Promising , I used to think it was just fun .

The seven level butcher on display here , But also deep learning by human discover The tip of the iceberg , Don't laugh when you are drunk on the battlefield , How many people have fought in ancient times .