Introduction
This lab demonstrates how to tune the parameters of a Radial Basis Function (RBF) kernel SVM. The gamma and C parameters of the RBF kernel are crucial for the performance of the SVM model. The goal is to choose the optimal values of these parameters that maximize the accuracy of the model.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Load and Prepare Data Set
- Load the iris dataset from scikit-learn.
- Separate the data into feature matrix
Xand target vectory. - Standardize the feature matrix
XusingStandardScaler. - Create a simplified version of the dataset for decision function visualization by keeping only the first two features in
Xand sub-sampling the dataset to keep only two classes and make it a binary classification problem.
Train Classifiers
- Create a logarithmic grid of the
gammaandCparameters usingnp.logspace. - Split the data into training and testing sets using
StratifiedShuffleSplit. - Perform a grid search using
GridSearchCVto find the best parameters for the SVM model. - Fit a classifier for all parameters in the 2D version.
Visualization
- Visualize the decision function for a variety of parameter values on a simplified classification problem involving only 2 input features and 2 possible target classes (binary classification).
- Visualize the heatmap of the classifier's cross-validation accuracy as a function of
Candgamma.
Interpretation
- Interpret the results of the visualization and choose the optimal values for
Candgamma.
Summary
This lab demonstrated how to tune the parameters of a Radial Basis Function (RBF) kernel SVM. The gamma and C parameters of the RBF kernel are crucial for the performance of the SVM model, and the optimal values for these parameters can be found using a combination of grid search and visualization techniques.