【SVM】Some refined knowledge points about Support Vector Machines [part 1]

支持向量机(SVM)定义

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

●在一个二分类数据集中,存在着多个划分“超平面”,这些超平面可以用以下方程表示:

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 为法向量,决定了超平面的方向

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 为位移项,决定了超平面与原点之间的距离

●样本空间中任意点 到超平面的距离可写为:

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

●距离超平面最近的几个样本点组成“支持向量(support vector)”,两个异类支持向量到超平面的距离之和(间隔)为:

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

证明附后

从上面的式子看,间隔似乎只与 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 有关,但事实上 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 通过约束隐式地影响着 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 的取值

●为了最大化间隔,需要最大化 【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,这等价于最小化 【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,于是,出来吧,皮卡丘!(SVM基本型)

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]


对偶问题&拉格朗日乘子

使用拉格朗日乘子法可以得到上述问题的“对偶问题”,具体做法是为每条约束添加拉格朗日乘子 【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,通过令拉格朗日函数关于 【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1] 的偏导为零,化简得到对偶问题:

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

设上述对偶问题的解为 【SVM】Some refined knowledge points about Support Vector Machines [part 1] (可求解得到)

则原模型 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 中的参数

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

证明后附

●通过求解对偶问题得到原始问题的最优解,这就是线性可分支持向量机的对偶算法。这样做的优点,一是对偶问题往往更容易求解,二是自然引入核函数,进而推广到非线性分类


KKT条件

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

这个条件很精妙,它可以使得对于任意训练样本 【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,总有 【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 时,表示样本点不会在求和式 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 中出现

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 时,则必有【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,即所对应的样本点位于最大间隔边界上,是一个支持向量(请务必认真体会这两句话)


证明:已知 【SVM】Some refined knowledge points about Support Vector Machines [part 1],如何求出 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 【SVM】Some refined knowledge points about Support Vector Machines [part 1]

根据⑴拉格朗日函数 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 关于【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1] 的偏导⑵KKT条件,得:

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

首先证明 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 中至少存在一个 【SVM】Some refined knowledge points about Support Vector Machines [part 1]

假设 【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,由 【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,即间隔为0,这显然不成立

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 至少存在一个 【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,对此 【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1]

【SVM】Some refined knowledge points about Support Vector Machines [part 1] 代入得:

【SVM】Some refined knowledge points about Support Vector Machines [part 1] ,又 【SVM】Some refined knowledge points about Support Vector Machines [part 1]【SVM】Some refined knowledge points about Support Vector Machines [part 1] 只能为+1或-1)

【SVM】Some refined knowledge points about Support Vector Machines [part 1]

(这里可以看出,由于 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 不止一种取法, 【SVM】Some refined knowledge points about Support Vector Machines [part 1] 也就有了多种可能。理论上可以选取任意支持向量来求解b,但现实情况往往采用一种更为鲁棒的方法——使用所有支持向量求解的平均值)