1. The relationship between big data and machine learning ：
In the field of big data, what we do is data storage and simple statistical calculation , The application of machine learning in big data is to discover the rules or models of data , The model of computing data with machine learning algorithm , So that we can predict and determine the factors （ For example, in the big data user portrait project , Generated special user fields ）.
2. Application of big data in machine learning
At present, the actual market development mode , Which stage of big data should be applied to machine learning related technologies , Let's talk about it next , First of all, the current big data architecture models are listed as follows
2.1 data acquisition （ftp,socket）--- data storage （hdfs）--- Data cleaning （MapReduce）---- Data analysis （hive）---sqoop Import ----- storage （mysql,oracle）---web display
2.2 data acquisition （ftp,socket）--- data storage （hdfs）--- Data cleaning （MapReduce）--- Column database storage （hbase）-----thrift（ Coprocessor ）---web display
2.3 data acquisition （ftp,socket）--- data storage （hdfs）--- Data cleaning （MapReduce）---- Data analysis （hive）----impala（ Real time data analysis ）---jdbc-----web display
2.4 data acquisition （ftp,socket）--- data storage （hdfs）---spark calculation ----- storage （mysql,oracle）---web display
The distributed task scheduling system is used after the development （azkaban,oozie） Calculate the cycle operation of the above architecture .
The application stage of machine learning in big data is ： Data analysis （hive）--- machine learning ----sqoop Import , Column database storage （hbase）----
machine learning ------thrift（ Coprocessor ）
Summary in big data architecture , Machine learning is in the upper stage , After big data is calculated, it will be stored in the end or directly web Machine learning is needed to produce a decision-making and forecasting model .
3. machine learning
3.1 The theory of machine learning 1950 It was proposed in , However, the storage mechanism of data volume is backward , The combination degree of the algorithm is low , As well as our low speed of data processing, we have been unable to implement the technology , Today, with the rapid development of science and technology , Because of the above technical difficulties , Lead to machine learning , So it is widely used , Changed people's lives ,
Machine learning is an interdisciplinary subject , Involving probability theory , statistics , Approximation , Convex calculus , Algorithm complexity theory and other disciplines , How does specialized computing machine simulate human learning behavior , Acquire knowledge and skills of the line , Then change the existing knowledge structure to improve the performance .
machine learning （ML） And artificial intelligence （AI） The relationship between ： It is the core of artificial intelligence , It is the fundamental way to make computers have intelligence , It is widely used in all fields of artificial intelligence , It mainly uses induction , Sum, not deduction .
Learning action of machine learning ： For experience E（experience） And a series of tasks T（tasks） And a measure of performance P, If experience E The accumulation of , For defined tasks T Can improve performance P, That means you have the ability to learn .
3.4 Application of machine learning ： speech recognition , Autonomous driving , Language translation , Recommendation system , Drones and so on
The relationship between machine learning and deep learning ： Deep learning is a kind of technology to realize machine learning. Deep learning makes many machine learning applications come true , And expand the whole field of artificial intelligence . Deep learning realizes various tasks one by one , And make all the machine AIDS possible . Driverless car , Movie recommendation, etc , Are within reach or about to become a reality . Artificial intelligence is now , In the future, too . With deep learning , Artificial intelligence may even achieve the effect of science fiction that we imagine .
（ Easier to understand ） Artificial intelligence is the ancestor , Machine learning is the father generation , Deep learning is the son generation !
The operation mode of machine learning ： We know that the program is handled by CPU（ a central processor ） To calculate the , But at present, most of the computation of machine learning in our development and application （ Deep learning ） Yes GPU（ Graphics processor ） To calculate the
3.7 The concept of machine learning ：
3.7.1 Basic concepts ： Training set , Test set , characteristic value , Supervised learning , Unsupervised learning , Semi supervised learning , classification , regression
3.7.2 Concept learning ： The concept of human learning （ Like a baby ）： bird , Dog ; vehicle , house ; Black box and computer
（ How to recognize and distinguish ?） definition ： Concept learning refers to learning from a Boolean function （ Yes or no ） The Boolean function is deduced from the input and output training examples of
3.7.3 data set ： Real data set
3.7.4 That's ok ： sample data
3.7.5 column ： Characteristics or data
3.7.6 feature vector ： A vector of data from each sample
3.7.7 Attribute space ： The space of attribute charter
3.7.8 Training set ： Data sets for model training
3.7.9 Test set ： It is used to verify the model
3.7.10 Training process ：（ learning process ） Using training data sets + Machine learning algorithm ==》 Model
3.8 Supervised learning ：
supervise （supervised） It means that each sample in the training data set has a known output （ Class label label）
The prediction problem in which the output variable is a continuous variable is called regression （ regression ） problem
Regression algorithm ：
• Simple linear regression
• multiple linear regression
• Lasso regression
• Ridge regression
The prediction problem in which the output variable is a finite number of discrete variables is called classification problem
Classification algorithm ：
• Simple linear regression
• multiple linear regression
• Lasso regression
• Ridge regression
3.9 Unsupervised learning ：
People give machines a lot of data that doesn't have a sort mark , Let the machine classify the data , Abnormal detection, etc .
Dimension reduction （PCA,LDA）
3.10 Semi supervised learning ：
Semi supervised learning provides an advantage “ cheap ” The way of unlabeled samples
3.11 Strengthen learning ：
It is an important branch of machine learning , It is mainly used to solve the problem of continuous decision .
Go can be summed up as a reinforcement learning problem , You need to learn how to get out of the best moves in a variety of situations .
3.12 Transfer learning ：
Application scenarios ：
The problem of small data . Let's say a new online store , Sell a new kind of pastry , There is no data , You can't build a model to recommend to users . But if a user buys one thing, it will reflect that the user may buy another , So if you know the user is in another area , Like selling drinks , There's a lot of data , Use this data to build a model , Combined with the user's habit of buying drinks and buying pastries , We can successfully migrate the beverage recommendation model to the pastry field , such , In the case of less data, we can successfully recommend some pastries that users may like . This is an example , There are two areas , There is already a lot of data in one area , Can build a model successfully , There is one area where there is not much data , But it's related to that area , You can transfer that model .
The problem of personalization . For example, everyone wants their mobile phones to remember some habits , So you don't have to set it every time , How can we make mobile phones remember this ? In fact, we can transfer a general user's mobile phone model to personalized data through migration learning .
4. What knowledge should learning machine learning possess
4.1 Have a basic understanding of probability ,
4.2 Understand the basic knowledge of calculus and linear algebra ,
4.3 master Python Programming or R Language programming （ Commonly used in corporate development Python To complete machine learning data mining , In academia R Language to complete machine learning data mining ）
What we are required to achieve in the enterprise is ： Master machine learning algorithm and application framework, solve practical problems through classification and return .
Corresponding positions of machine learning ： data mining （ User portrait direction ）,NLP（ natural language processing ）, Recommendation system （ Recommendation algorithm , Sorting algorithm ）, Computational advertising （CTR estimate ）, computer vision （ Deep learning ）, speech recognition （HMM, Deep learning ） be careful ： The premise of these positions is to have experience in big data development .
5. Ten commonly used machine learning algorithms
5.1 Machine learning algorithms can be divided into three categories ——
Supervised learning , Unsupervised learning and reinforcement learning . Supervised learning is mainly used for some data sets （ Training data ） There are certain familiarity that can be acquired （ label ）, But the remaining samples are missing and need to be predicted . Unsupervised learning is mainly used to mine the hidden relationship between unlabeled datasets . Intensive learning is between the two
—— Each step of prediction or behavior has more or less some feedback , But there is no accurate label or error prompt .
5.2 Decision tree ： Decision tree is a decision support tool , It uses tree view or tree model to represent the decision-making process and subsequent results , Including probability event results, etc . Look at the figure below to understand the structure of the decision tree .
From the perspective of business decision making , Decision tree is to predict the probability of correct decision by minimizing the right and wrong judgment problems . This approach can help you with a structural , A systematic approach is used to draw reasonable conclusions .
5.3 Naive Bayes classifier ： Naive Bayesian classifier is a kind of simple probability classifier based on Bayesian theory , It assumes that features were previously independent of each other . Here is the formula ——
P(A|B) Represents a posteriori probability ,P(B|A) Is the likelihood value ,P(A) Is the prior probability of the category ,P(B) A priori probability representing the predictor .
Some examples of real-world scenarios include ：
* Detect spam email
* Divide news into technology , Politics , Sports and other categories
* Judge whether a paragraph expresses positive or negative emotions
* For face detection software
5.4 Random forest
Randomly select data from source data , Make up several subsets
S The matrix is the source data , Yes 1-N Data ,A B C yes feature, Last column C It's a category
from S Random generation M Submatrix
this M Subsets get M Decision trees
Put new data here M In the trees , obtain M Classification results , Count to see which category is predicted to be the most , Take this category as the final forecast result
5.5 Logical recursion
When the predicted target is probability like this , The value range must be greater than or equal to 0, Less than or equal to 1 Of , At this time, the simple linear model can not do it , Because when the definition domain is not in a certain range , The value range also exceeds the specified range .
So a model that needs this shape will be better
So how do you get this model ?
This model needs to satisfy two conditions Greater than or equal to 0, Less than or equal to 1
Greater than or equal to 0 The model can be selected absolute value , Square value , It's used here exponential function , Must be greater than 0
Less than or equal to 1 Using Division , Molecules are themselves , The denominator is self plus 1, That must be less than 1 Yes
Do another deformation , You get it logistic regression Model
The corresponding coefficients can be obtained by calculating the source data
In the end logistic Graphics of
support vector machine
Separate the two categories , I want to get a hyperplane , The optimal hyperplane is of two classes margin
To the maximum ,margin It's the distance between the hyperplane and the nearest point , As shown in the figure below ,Z2>Z1, So the green hyperplane is better
The hyperplane is expressed as a linear equation , The category above the line , Both greater than or equal to 1, The other is less than or equal to －1
The distance from point to surface is calculated according to the formula in the figure
So get it total margin The expression for is as follows , The goal is to maximize this margin, You need to minimize the denominator , So it becomes an optimization problem
Take a chestnut , Three points , Finding the optimal hyperplane , Defined weight vector＝（2,3）－（1,1）
obtain weight vector by （a,2a）, Put two points into the equation , Substitution （2,3） Other values ＝1, Substitution （1,1） Other values ＝-1, Solve the problem a and intercept w0
Value of , Then the expression of hyperplane is obtained .
a After finding out , Substitution （a,2a） What you get is support vector
a and w0 And the equation that we put into the hyperplane is support vector machine
5.7 K Nearest neighbor
k nearest neighbours
When a new data is given , Nearest to it k Points , Which category is more , What kind of data does this data belong to
Chestnut ： To distinguish cat and Dog , adopt claws and sound Two feature To judge , Circles and triangles are known , So this one star What kind does it represent
k＝3 Time , The points that these three lines link are the closest , There are more circles , therefore 这个star就是属于猫
最开心先初始化,这里面选了最简单的 3,2,1 作为各类的初始值
adaboost 是 bosting 的方法之一
adaboost 的栗子,手写识别中,在画板上可以抓取到很多 features,例如 始点的方向,始点和终点的距离等等
training 的时候,会得到每个 feature 的 weight,例如 2 和 3 的开头部分很像,这个 feature
而这个 alpha 角 就具有很强的识别性,这个 feature 的权重就会较大,最后的预测结果是综合考虑这些 feature 的结果
Neural Networks 适合一个input可能落入至少两个类别里
NN 由若干层神经元,和它们之间的联系组成 第一层是 input 层,最后一层是 output 层
在 hidden 层 和 output 层都有自己的 classifier
input 输入到网络中,被激活,计算的分数被传递到下一层,激活后面的神经层,最后output 层的节点上的分数代表属于各类的分数,下图例子得到分类结果为
同样的 input 被传输到不同的节点上,之所以会得到不同的结果是因为各自节点有不同的weights 和 bias
这也就是 forward propagation
Markov Chains 由 state 和 transitions 组成
栗子,根据这一句话 ‘the quick brown fox jumps over the lazy dog’,要得到 markov chain
这是一句话计算出来的概率,当你用大量文本去做统计的时候,会得到更大的状态转移矩阵,例如 the 后面可以连接的单词,及相应的概率