This is a personal study note, The main content is to observe the fitting of polynomial curve to data/ Over fitting/ Under fitting, And the effect of adding regularization term to the model.
fitting: Machine learning model in the process of training, By updating parameters, Keep the model consistent with the observable data( Training set) Process.

Regression fitting with primary curve
X_train=[[6],[8],[10],[14],[18]] y_train=[[7],[9],[13],[17.5],[18]] from
sklearn.linear_model import LinearRegression regressor=LinearRegression()
# Default configuration initialization linear regression model,y_train) # Modeling based on training set data import numpy as np
xx=np.linspace(0,26,100) # Build test set xx=xx.reshape(xx.shape[0],1)
yy=regressor.predict(xx) %matplotlib inline import matplotlib.pyplot as plt
plt.scatter(X_train,y_train) plt1,=plt.plot(xx,yy,label='Degree=1') plt.axis([0,
25,0,25]) plt.xlabel('independent variable') plt.ylabel('dependent variable')
plt.legend(handles=[plt1])'The R-squared value of Linear
Regressor performing on the training data is', regressor.score(X_train,y_train))

Fitting by quadratic curve regression
from sklearn.preprocessing import PolynomialFeatures
poly2=PolynomialFeatures(degree=2) # Polynomial feature generator
X_train_poly2=poly2.fit_transform(X_train)#X_train_poly2 That is, the features of quadratic polynomials constructed from training sets
# Quadratic polynomial regression model for modeling production xx_poly2=poly2.transform(xx) # Constructing quadratic polynomial feature for test set data
yy_poly2=regressor_poly2.predict(xx_poly2) plt.scatter(X_train,y_train)
plt1,=plt.plot(xx,yy,label='Degree=1') plt2,=plt.plot(xx,yy_poly2,label=
'Degree=2') plt.axis([0,25,0,25]) plt.xlabel('independent variable') plt.ylabel(
'dependent variable') plt.legend(handles=[plt1,plt2]) print('The
R-squared value of Polynomial Regressor(Degree=2) performing on the training
data is', regressor_poly2.score(X_train_poly2,y_train))

Regression fitting with quartic curve
from sklearn.preprocessing import PolynomialFeatures
poly4=PolynomialFeatures(degree=4) # Polynomial feature generator
X_train_poly4=poly4.fit_transform(X_train) #X_train_poly4 That is, the characteristics of quartic polynomials constructed from training sets
# A quadric polynomial regression model for modeling production xx_poly4=poly4.transform(xx) # Constructing the features of quartic polynomials for test set data
yy_poly4=regressor_poly4.predict(xx_poly4) plt.scatter(X_train,y_train)
plt1,=plt.plot(xx,yy,label='Degree=1') plt2,=plt.plot(xx,yy_poly2,label='Degree=
2') plt4,=plt.plot(xx,yy_poly4,label='Degree=4') plt.axis([0,25,0,25])
plt.xlabel('independent variable') plt.ylabel('dependent variable')
plt.legend(handles=[plt1,plt2,plt4]) print('The R-squared value of
Polynomial Regressor(Degree=4) performing on the training data is',

Evaluate the performance of the above three models on the test set
X_test=[[6],[8],[11],[16]] y_test=[[8],[12],[15],[18]] print('Linear
X_test_poly2=poly2.transform(X_test)print('Polynomial 2 regression:'
X_test_poly4=poly4.transform(X_test)print('Polynomial 4 regression:'

It can be seen from the above results:
When the model complexity is low( First order polynomial) Time, The model does not fit the data on the training set well, And it's not bad on the test set, Under fitting.
When the complexity of the model is very high( Quartic polynomial) Time, Although the model fits almost all the training data, But the model itself becomes volatile, Almost lost the ability to predict the unknown data, Overfitting.
Over fitting and under fitting are both manifestations of lack of model generalization ability.

Because unknown data cannot be predicted, So we should make full use of training data when modeling, Avoid under fitting; At the same time, we should pursue better model generalization ability, Avoid over fitting.
This requires increasing model complexity, Improve performance on observable data while, At the same time, the generalization ability of the model should be considered, Prevent over fitting. To balance the two, Usually usedL1/L2 Regularization method.

So called regularization method, That is, adding model parameters to the optimized objective functionL1/L2 norm:
L1 Regularization is to make many elements in the parameter vector tend to0, Make effective features sparse, CorrespondingL1 The regularization model is calledLasso.
L2 Regularization is to make most of the elements in the parameter vector very small, Suppress the differences between parameters, CorrespondingL2 The regularization model is calledRidge.
from sklearn.linear_model import Lasso lasso_poly4=Lasso() # Default configuration initializationLasso,y_train)# utilizeLasso Regression fitting of quartic polynomial features
print(lasso_poly4.score(X_test_poly4,y_test))# Evaluate on test set print(lasso_poly4.coef_)
# outputLasso Parameter list of the model print(' ') print(regressor_poly4.score(X_test_poly4,y_test))
# Let's take a look at the quadric polynomial regression fitting without regularization print(regressor_poly4.coef_)

from sklearn.linear_model import Ridge ridge_poly4=Ridge() # Default configuration initializationRidge,y_train) # utilizeRidge Regression fitting of quartic polynomial features print(ridge_poly4
.score(X_test_poly4,y_test)) # Evaluate on test set print(ridge_poly4.coef_) # outputRidge Parameter list of the model
print(np.sum(ridge_poly4.coef_**2)) print(' ') print(regressor_poly4.coef_)
# Let's take a look at the quadric polynomial regression fitting without regularization print(np.sum(regressor_poly4.coef_**2))