This is a personal study note , The main content is to observe the fitting of polynomial curve to data / Over fitting / Under fitting , And the effect of adding regularization term to the model .
fitting : Machine learning model in the process of training , By updating parameters , Keep the model consistent with the observable data ( Training set ) The process of .

Regression fitting with primary curve
X_train=[[6],[8],[10],[14],[18]] y_train=[[7],[9],[13],[17.5],[18]] from
sklearn.linear_model import LinearRegression regressor=LinearRegression()
# Default configuration initialization linear regression model,y_train) # Modeling based on training set data import numpy as np
xx=np.linspace(0,26,100) # Build test set xx=xx.reshape(xx.shape[0],1)
yy=regressor.predict(xx) %matplotlib inline import matplotlib.pyplot as plt
plt.scatter(X_train,y_train) plt1,=plt.plot(xx,yy,label='Degree=1') plt.axis([0,
25,0,25]) plt.xlabel('independent variable') plt.ylabel('dependent variable')
plt.legend(handles=[plt1])'The R-squared value of Linear
Regressor performing on the training data is', regressor.score(X_train,y_train))

Fitting by quadratic curve regression
from sklearn.preprocessing import PolynomialFeatures
poly2=PolynomialFeatures(degree=2) # Polynomial feature generator
X_train_poly2=poly2.fit_transform(X_train)#X_train_poly2 That is, the features of quadratic polynomials constructed from training sets
# Quadratic polynomial regression model for modeling production xx_poly2=poly2.transform(xx) # Constructing quadratic polynomial feature for test set data
yy_poly2=regressor_poly2.predict(xx_poly2) plt.scatter(X_train,y_train)
plt1,=plt.plot(xx,yy,label='Degree=1') plt2,=plt.plot(xx,yy_poly2,label=
'Degree=2') plt.axis([0,25,0,25]) plt.xlabel('independent variable') plt.ylabel(
'dependent variable') plt.legend(handles=[plt1,plt2]) print('The
R-squared value of Polynomial Regressor(Degree=2) performing on the training
data is', regressor_poly2.score(X_train_poly2,y_train))

Regression fitting with quartic curve
from sklearn.preprocessing import PolynomialFeatures
poly4=PolynomialFeatures(degree=4) # Polynomial feature generator
X_train_poly4=poly4.fit_transform(X_train) #X_train_poly4 That is, the characteristics of quartic polynomials constructed from training sets
# A quadric polynomial regression model for modeling production xx_poly4=poly4.transform(xx) # Constructing the features of quartic polynomials for test set data
yy_poly4=regressor_poly4.predict(xx_poly4) plt.scatter(X_train,y_train)
plt1,=plt.plot(xx,yy,label='Degree=1') plt2,=plt.plot(xx,yy_poly2,label='Degree=
2') plt4,=plt.plot(xx,yy_poly4,label='Degree=4') plt.axis([0,25,0,25])
plt.xlabel('independent variable') plt.ylabel('dependent variable')
plt.legend(handles=[plt1,plt2,plt4]) print('The R-squared value of
Polynomial Regressor(Degree=4) performing on the training data is',

Evaluate the performance of the above three models on the test set
X_test=[[6],[8],[11],[16]] y_test=[[8],[12],[15],[18]] print('Linear
X_test_poly2=poly2.transform(X_test)print('Polynomial 2 regression:'
X_test_poly4=poly4.transform(X_test)print('Polynomial 4 regression:'

It can be seen from the above results :
When the model complexity is low ( First order polynomial ) Time , The model does not fit the data on the training set well , And it's not bad on the test set , Under fitting .
When the complexity of the model is very high ( Quartic polynomial ) Time , Although the model fits almost all the training data , But the model itself becomes volatile , Almost lost the ability to predict the unknown data , I.e. over fitting .
Over fitting and under fitting are both manifestations of lack of model generalization ability .

Because unknown data cannot be predicted , So we should make full use of training data when modeling , Avoid under fitting ; At the same time, we should pursue better model generalization ability , Avoid over fitting .
This requires increasing model complexity , Improve performance on observable data while , At the same time, the generalization ability of the model should be considered , Prevent over fitting . To balance the two , Usually L1/L2 Regularization method .

So called regularization method , That is, adding model parameters to the optimized objective function L1/L2 norm :
L1 Regularization is to make many elements in the parameter vector tend to 0, Make effective features sparse , Corresponding L1 The regularization model is called Lasso.
L2 Regularization is to make most of the elements in the parameter vector very small , Suppress the differences between parameters , Corresponding L2 The regularization model is called Ridge.
from sklearn.linear_model import Lasso lasso_poly4=Lasso() # Default configuration initialization Lasso,y_train)# utilize Lasso Regression fitting of quartic polynomial features
print(lasso_poly4.score(X_test_poly4,y_test))# Evaluate on test set print(lasso_poly4.coef_)
# output Lasso Parameter list of the model print(' ') print(regressor_poly4.score(X_test_poly4,y_test))
# Let's take a look at the quadric polynomial regression fitting without regularization print(regressor_poly4.coef_)

from sklearn.linear_model import Ridge ridge_poly4=Ridge() # Default configuration initialization Ridge,y_train) # utilize Ridge Regression fitting of quartic polynomial features print(ridge_poly4
.score(X_test_poly4,y_test)) # Evaluate on test set print(ridge_poly4.coef_) # output Ridge Parameter list of the model
print(np.sum(ridge_poly4.coef_**2)) print(' ') print(regressor_poly4.coef_)
# Let's take a look at the quadric polynomial regression fitting without regularization print(np.sum(regressor_poly4.coef_**2))