SVM(Support Vector
Machine) Support vector machine , It's a common method of discrimination . In machine learning , Is a supervised learning model , Usually used for pattern recognition , Classification and regression analysis .

Matlab Prepared by Lin Zhiren libsvm Toolkits work well SVM train .Python We have sklearn Tool package for machine learning algorithm training ,Scikit-Learn The library has implemented all the basic machine learning algorithms .

The following is referenced from
Blog of , In the original Python2 Code of is updated to Python3 Code of .
more python And machine learning <>
Next, use the Iris Orchid dataset as an example :

As a result of UCI Downloaded in database Iris The original dataset looks like this , The first four columns are characteristic columns , The fifth column is the category column , There are three categories Iris-setosa,
Iris-versicolor, Iris-virginica.

Need to use numpy Split it .

Dataset download address :

download that will do .

Python3 code :
from sklearn import svm import numpy as np import matplotlib.pyplot as plt
import matplotlib as mpl from matplotlib import colors from
sklearn.model_selectionimport train_test_split def iris_type(s): it = {
b'Iris-setosa': 0, b'Iris-versicolor': 1, b'Iris-virginica': 2} return it[s]
path ='C:\\Users\\dell\\desktop\\' # Data file path data = np.loadtxt(path,
dtype=float, delimiter=',', converters={4: iris_type}) x, y = np.split(data, (4
,), axis=1) x = x[:, :2] x_train, x_test, y_train, y_test = train_test_split(x,
y, random_state=1, train_size=0.6) # clf = svm.SVC(C=0.1, kernel='linear',
decision_function_shape='ovr') clf = svm.SVC(C=0.8, kernel='rbf', gamma=20,
decision_function_shape='ovr'), y_train.ravel())
print(clf.score(x_train, y_train))# accuracy y_hat = clf.predict(x_train)
print(clf.score(x_test, y_test)) y_hat2 = clf.predict(x_test) x1_min, x1_max =
x[:,0].min(), x[:, 0].max() # The first 0 Range of columns x2_min, x2_max = x[:, 1].min(), x[:, 1
].max()# The first 1 Range of columns x1, x2 = np.mgrid[x1_min:x1_max:200j, x2_min:x2_max:200j] #
Generate grid sample points grid_test = np.stack((x1.flat, x2.flat), axis=1) # Test point mpl.rcParams[
'font.sans-serif'] = [u'SimHei'] mpl.rcParams['axes.unicode_minus'] = False
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF']) cm_dark
= mpl.colors.ListedColormap(['g', 'r', 'b']) grid_hat = clf.predict(grid_test)
# Forecast classification value grid_hat = grid_hat.reshape(x1.shape) # Make it the same shape as the input alpha = 0.5
plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)# Display of forecast value plt.plot(x[:, 0], x[:, 1
],'o', alpha=alpha, color='blue', markeredgecolor='k') plt.scatter(x_test[:, 0
], x_test[:,1], s=120, facecolors='none', zorder=10) # In loop test set samples plt.xlabel(
u' Calyx length ', fontsize=13) plt.ylabel(u' Calyx width ', fontsize=13) plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max) plt.title(u'SVM classification ', fontsize=15)
split( data , Split position , axis =1( Horizontal division ) or 0( Vertical split )).

x = x[:, :2] For the convenience of later drawing more intuitive , Therefore, only the first two columns of eigenvalue vectors are used for training .


sklearn.model_selection.train_test_split Randomly divided training set and test set .train_test_split(train_data,train_target,test_size= number ,

Parameter interpretation :

train_data: Sample feature set to be divided

train_target: Sample results to be divided

test_size: Sample proportion , If it's an integer, it's the number of samples

random_state: Is the seed of random number .

Random number seed : In fact, it's the number of random numbers in this group , When it is necessary to repeat the test , Ensure to get the same set of random numbers . Like you fill in every time 1, You get the same random array with the same parameters . But fill in 0 Or not , It's different every time . The generation of random numbers depends on the seed , The relationship between random number and seed follows the following two rules : Different seeds , Generate different random numbers ; Same seed , Even if the instances are different, the same random number will be generated .

kernel=’linear’ Time , Is a linear kernel ,C The bigger the classification, the better , But it is possible to over fit (defaul C=1).

kernel=’rbf’ Time (default), Gauss kernel ,gamma The smaller the value , More continuous classification interface ;gamma Higher value , Classification interface “ scattered ”, The better the effect of classification , But it is possible to over fit .

Linear classification results :

rbf Kernel function classification results :

more python And machine learning <>