Machine Learning 06 - Support Vector Machine
正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。
6.1 Large Margin Classification
6.1.1 Optimizaiton objective
Here we intorduce the last supervised algorithm : Support Vector Machine.
Hypothesis :
Cost function :
where is the cost when and is the cost when . An intuitive explanation is below :
Decision boundary :
SVM will find a line that has the largest margin between the data. And the regularized term is intuitively show below :
6.1.2 Concept of kernels
In this part, in order to fit Non-linear decision boundary, we will adapt the hypothesis function to
(1) Polynomial
It can fit dataset very well, but we don’t know which features to add and it is very computationally expensive.
(2) Gaussian Kernel
First, choose some landmarks
Second, define , such as Gaussian Kernel :
It mesures the similarity of two points :
- If ,
- If is far from : .
And the just like a scale of the distance of two points :
Finally, what it perdicet (for example) is :
6.1.3 SVM with kernels
(1) Choose landmarks
Given ,and choose .
(2) Define kernels
We define as Gaussian Kernel :
(3) Training
Use minimization algorirhm to solve it.
(4) Evaluation
- Large :Lower bias, higher variance.
- Small :Higher bias, lower variance.
- Large : Higher bias, lower variance. ( is more “smooth”)
- Small : Lower bias, higher variance.
(5) Note
- Perform feature scaling before using the Gaussian Kernel .
- Not all similarity functions make valid kernals. (Need to satisfy “Mercer Theorem” to make sure SVM packages run correctly)
- Other kernels : Polynomial kernel, String kernel, …
- Muti-class classification : one-vs-all method.
- If , use logistic regression or SVM without kernel; if is samll, is intermediate, use SVM with kernel; if , create more features, and turn to case one. Neural network likely to work well for most of these things.