我已经有两年 ML 经历，这系列课主要用来查缺补漏，会记录一些细节的、自己不知道的东西。

本次笔记补充视频 BV1JE411g7XF 的缺失部分。在另一个UP主上传的2017课程BV13x411v7US中可找到。本节内容 65 分钟左右。

本节内容综述

Hinge Loss + Kernel Method = Support Vector Machine
首先讲 Hinge Loss 。
介绍 Linear SVM 及其优化方法。
接下来是 Kernel Method 部分。
首先我们讨论 Dual Representation 。
这是为了引出 Kernel Function ，大幅简化投到高维空间后的运算。
最后提了 SVM related methods 以及 DL 与 SVM 的对比。

Hinge Loss

回顾 Binary Classification
Hinge Loss

Linear SVM

Linear SVM - gradient descent
Linear SVM - another formulation

Dual Representation
Kernel Trick

Radial Basis Function Kernel
Sigmoid Kernel
Directly Design K

SVM related methods
Deep Learning v.s. SVM

小细节

Hinge Loss

回顾 Binary Classification

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，因为定义 $\delta(g(x^n)\neq \hat{y}^n)$ 后，不可导，因此更换为 $L(f(x^n),\hat{y}^n)$ 。

如上，如果不使用交叉熵，使用 loss ，其实并不容易。使用 square loss 效果并不好。

如上，我们一般使用交叉熵。

Hinge Loss

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
而可以发现，Hinge Loss 比较像交叉熵。

Linear SVM

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，我们记为 $f(x)=w^T x$ 。我们的损失函数是凸函数，适合梯度下降。

此外，SVM也有 deep 的版本。

Linear SVM - gradient descent

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上为其梯度下降推导。

Linear SVM - another formulation

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，我们把 Hinge Loss $l$ 记为 $\epsilon^n$ 。在最小化 $L$ 时，上下两个红框等价，下面相当于为 $\hat{y}^n f(x) \ge 1$ 做了一个放宽。

这可以用二次规划来求解，做梯度下降也没问题。

Dual Representation

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，最终可能只有小部分数据前面的系数不是 0 ，这些就是支持向量。

接着，我们可以利用核函数。
【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，首先将 $w$ 变换成矩阵形式。而 $X^T x$ 就是 kernel function 。

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，我们只需要知道 kernel function ，就可以达到优化目标。这叫做 Kernel Trick 。

Kernel Trick

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，我们根据 Kernel Trick 推导出的式子直接进行计算，比先做转换（投影到更高维的平面）、再做内积要快。

如上，对于高维向量同理。

Radial Basis Function Kernel

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如果想将 $exp(-\frac{1}{2} ||x-z||_2)$ 等价为 $\phi(x) \cdot \phi(z)$ （对 exp 泰勒展开），则 $\phi(*)$ 是无限维的。如果直接运算，难以描述。

但是使用 Kernel Trick ，转换为内积，则方便很多。

但是要小心，在无穷多维平面，可能会导致过拟合。

Sigmoid Kernel

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，我们使用 Sigmoid Kernel 相当于使用单层神经网络。

Directly Design K

甚至，我们可以跳过设计 $\phi$ 与 $x$ ，由 Mercer’s theory 直接设计 Kernel Function 。
【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上我们可以对声音信号进行处理。

SVM related methods

Support Vector Regression
Ranking SVM
One-class SVM

Deep Learning v.s. SVM

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)
如上，其实 SVM 的 Kernel 是 learnable 的，但是可能没法像神经网络那样彻底。引出有人提出 Multiple Kernel Learning 。

【李宏毅2020 ML/DL】补充：Support Vector Machine (SVM)

本节内容综述

文章目录

小细节

Hinge Loss

回顾 Binary Classification

Hinge Loss

Linear SVM

Linear SVM - gradient descent

Linear SVM - another formulation

Dual Representation

Kernel Trick

Radial Basis Function Kernel

Sigmoid Kernel

Directly Design K

SVM related methods

Deep Learning v.s. SVM

相关推荐