Reading and understanding data science, Machine learning andAI What's the difference?
original text：What’s the difference between data science, machine learning, and
artificial intelligence? <http://varianceexplained.org/r/ds-ml-ai/>
author：David Robinson <http://varianceexplained.org/about/>
The author is a data scientist, Every time I talk to someone about this work, Someone will confuse it with artificial intelligence and machine learning, Do you know the difference between the three fields? Follow the author's ideas, Let's get to know! Here is the translation.
When I introduce myself as a data scientist, People always ask me“ What's the difference between data science and machine learning?” perhaps“ So your job must be related to artificial intelligence?” I've answered these questions countless times, The answer to each explanation is based on my own summary“ Three principles”：
If the same code is written3 second, Well, it's better to write it as a function
When the same advice is told3 second, Well, it's better to write a blog of these suggestions
— David Robinson（@drob）2017 year11 month9 day
There is a lot of overlap between these areas, But they are not interchangeable： Most professionals in these fields are interested in how to classify specific work into data science, Machine learning or artificial intelligence, They all have an intuitive view, Even if it's hard to express it in words.
So in this article, I will give a brief definition of the differences between the three areas, The definition is as follows：
* Data science responsible for insights
* Machine learning is responsible for forecasting
* AI responsible behavior
It needs to be clear, The limit between the three is not so absolute
： Not everything that fits every definition belongs to this domain.（ It's like a fortune teller making predictions, But they're not machine learning!） Of course, These differences are not a good way to determine a person's role or position（“ Am I a data scientist?”）, It's about experience.
But I think this definition is an effective way to distinguish these three kinds of work, And if you talk about the differences between the three, This answer doesn't sound silly. It should be noted that, I use
Descriptive rather than descriptive
： I'm talking about these terms“ What should it be” Not interested, It's about how people in the field usually use them.
Data science responsible for insights
Data science is different from the other two fields, Because its goal is very close to that of human beings： Gain insight and understanding.Jeff Leek A good definition of the types of insights that data science can reach
, Include descriptive, Exploratory and causal.
same, Not everything that generates insights qualifies as data science( Classic definition of data science
It contains statistics, Combination of software engineering and domain experts). But we can use this definition to distinguish data science from machine learning and artificial intelligence. The main difference between them is, In Data Science, There's always one person in the loop： Someone understands the point, See number, Or benefit from the conclusion. say“ Chess algorithms use data science to choose the next move”, perhaps“ Google Maps uses data science to recommend driving directions” It's all meaningless.
therefore, Emphasis on the definition of data science：
* statistical inference
* Data visualization
* experimental design
* domain knowledge
Data scientists may use simple tools： % reportable, And according toSQL Query drawing line chart. You can also use very complex methods: Possible collaboration with distributed data stores, Analyze trillions of records, Develop cutting-edge statistical techniques, Establish interactive visualization. Whatever you use, The purpose is to better understand the data.
Machine learning is responsible for forecasting
I think machine learning is a prediction
field：“ Given exampleX Have specific characteristics and then predict”. These predictions may be about the future(“ Predict whether the patient will enter sepsis”), But it may also be some characteristics that are not easily recognized by computers(“ Forecast
Is there a bird in this image <https://xkcd.com/1425/>”). Almost allKaggle competition
They think it's machine learning: Provide some training data, Then see if competitors can make accurate predictions about new examples.
There's a lot of overlap between data science and machine learning. for example, Relationships where insights can be derived using logistic regression(“ Affluent users are more likely to buy our products, So we should change our marketing strategy”), To predict(“ The user has53% Opportunity to buy our products, So we should recommend our products to them”).
Models like random forests are less interpretable, More suitable“ machine learning” Description, But deep learning is hard to explain. If your goal is to gain insight, Rather than making predictions, This may get in the way of you. therefore, We can assume that there is a data science and machine learning“ spectrum”, More in terms of interpretable models and machine learning in Data Science“ black box” Model.
Most practitioners can easily switch back and forth between two tasks. Machine learning and data science are used in my work： I can be there.Stack
Overflow Install a model on, To determine which users may be looking for work( machine learning), Then build a summary and visualization to test the working principle of the model( Data Science). This is to find defects in the model, And with
Algorithm bias <https://en.wikipedia.org/wiki/Algorithmic_bias>
An important way to fight. This is one of the reasons why data scientists are often responsible for developing machine learning components for products.
AI responsible behavior
AI is by far the oldest and most widely known of the three names, So its definition is the most challenging. Thanks to the researchers who are looking for money or attention, Journalists and startups, It makes the term of artificial intelligence widely spread and marketed.
When you raise money, It's artificial intelligence.
When you're hiring, It's machine learning.
When you execute, It's linear regression.
When debugging, This belongs toprintf()
— Baron Schwartz (@xaprb) 2017 year11 month15 day
stay“ Artificial intelligence” In all definitions of, One thing in common is that independent agents perform or recommend operations( for examplePoole, Mackworth andGoebel 1998
<http://www.cs.ubc.ca/~poole/ci.html>, Russell andNorvig 2003
<http://aima.cs.berkeley.edu/>). Some I think should be described asAI The system includes：
* Game algorithm(Deep Blue
* Robot technology and control theory( Motion planning, Walking biped robot)
* optimization( Google Maps choose route)
* natural language processing(bots)
* Reinforcement learning
same, We can see a lot of overlap with other fields. Deep learning <https://en.wikipedia.org/wiki/Deep_learning>
It belongs to machine learning, Also belong toAI field, It's very interesting. A typical use case is to train data, Then generate the forecast, In imageAlphaGo This kind of game algorithm has achieved great success.
But there are differences. If I analyze some sales data and find that customers in specific industries update more than those in other industries, So the output is some numbers and graphs, Rather than specific operations.( Executives may use these conclusions to adjust their sales strategies, But it's not
The difference between artificial intelligence and machine learning is more subtle, Machine learning in history is often regarded as a branch of artificial intelligence( Especially computer vision, This is a typical AI problem). But I think, Machine learning is largely independent of artificial intelligence, Part of the reason is the rebound case described above: Most people who work on prediction don't like to describe themselves as AI researchers.
According to today's definition,y=mx+b It's an artificial intelligence robot, It can tell you where a line goes.
— AmyHoy✨(@amyhoy)3 month29 day,2017 year
case study： How to use the three technologies together?
Suppose a driverless car is being built, Now it's stuck in the parking sign. We need to use related technologies between these three fields.
machine learning: The car must recognize the stop sign through the camera. We've built millions of street object photo datasets, And train algorithms to predict which streets have parking signs.
Artificial intelligence: Once the car can recognize the stop sign, It needs to decide when to brake
. We need it to be able to judge according to different road conditions( for example, You need to know if you can brake too fast on a slippery road), Too early or too late is dangerous, This is control theory
Data Science: The results of the street test show that the car's performance is not good enough, Under some scenes, It may travel along the right side of the stop sign, But it's not reported. After analyzing the street test data, We have a conclusion
, The concept of a missed report scenario depends on the different time periods of the day: Before sunrise or after sunset, More likely to miss a stop sign. We found that most of the data sets contained only daytime objects, So we construct a better data set, It includes night images, Then go back to machine learning.
Welcome to joinCSDN AI technology exchangeQQ group：299059314, A lot of learning materials.