无约束优化问题之梯度下降法、牛顿法原理

无约束优化问题是机器学习中最普遍、最简单的优化问题

$x^{*}=\mathop{min}\limits_{x}f_{(x)},x\in R^{n}$

梯度下降法推导

对于只有两个维度的函数 $f_{(x,y)}$ ,如下图所示。

无约束优化问题之梯度下降法、牛顿法原理
如果现在在 $P$ 点，假设 $|PP'|=L,\angle P'Px=\theta$ ，则由 $P$ 到 $P'$ ，函数的单位长度变化量为：
$\frac{f_{(x_{0}+L\cos{\theta}, y_{0}+L\sin{\theta})}-f_{(x_{0}, y_{0})}}{L}$
$=\frac{f_{(x_{0}+L\cos{\theta}, y_{0}+L\sin{\theta})}-f_{(x_{0}+L\cos{\theta}, y_{0})}}{L}+\frac{f_{(x_{0}+L\cos{\theta}, y_{0})}-f_{(x_{0}, y_{0})}}{L}$
$=\sin{\theta}\frac{f_{(x_{0}+L\cos{\theta}, y_{0}+L\sin{\theta})}-f_{(x_{0}+L\cos{\theta}, y_{0})}}{L\sin{\theta}}+\cos{\theta}\frac{f_{(x_{0}+L\cos{\theta}, y_{0})}-f_{(x_{0}, y_{0})}}{L\cos{\theta}}$
$=\sin{\theta}f_{y(x_{0},y_{0})}+\cos{\theta}f_{x(x_{0},y_{0})}\quad(\mathop{L\rightarrow0})$
由柯西不等式：
$<m, n>=||m||\cdot||n||\cdot cos{\theta}$
$<m, n>^{2}=||m||^{2}\cdot||n||^{2}\cdot cos^{2}{\theta}\leq||m||^{2}\cdot||n||^{2}$
且只有当 $\theta =0$ 即两向量共线时等号成立。
令 $m=(\sin{\theta},\cos{\theta}),n=(f_{y},f_{x})$ ，则
$(\sin{\theta}\cdot f_{y}+\cos{\theta}\cdot f_{x})^{2}\leq(\sin^{2}{\theta}+\cos^{2}{\theta})(f_{y}^{2}+f_{x}^{2})=f_{y}^{2}+f_{x}^{2}$
$"="$ 成立的条件是两向量共线，即
$\frac{\sin{\theta}}{\cos{\theta}}=\frac{f_{y}}{f_{x}}$
此时函数的单位长度变化量最大。
而 $\frac{\sin{\theta}}{\cos{\theta}}=\frac{f_{y}}{f_{x}}$ 意味着单位长度变化量最大的方向为梯度方向。
类比于有 $n$ 个自变量的函数 $f_{(x_{1},x_{2}……x_{n})}$ ，其梯度为 $(\frac{\partial f}{\partial x_{1}},\frac{\partial f}{\partial x_{2}}……\frac{\partial f}{\partial x_{n}})$ ，则延梯度方向函数增长最快。
基于此原理，梯度下降法取梯度的反方向作为移动方向，延此方向函数下降最快。

牛顿法推导

初级版本

无约束优化问题之梯度下降法、牛顿法原理
对于 $minf_{(x)}$ 来说，需要 $g_{(x)}=f_{(x)}^{'}=0$ 。
$x_{n}$ 是第 $n$ 个点的横坐标，我们需要求出 $g_{(x)}=0$ 时的横坐标。
首先对第 $n$ 个点求切线为：
$y-g_{(x_{n})}=g_{(x_{n})}^{'}(x-x_{n})$
令 $y=0$ 得到 $x=x_{n}-\frac{g_{(x_{n})}}{g_{(x_{n})}^{'}}$
用 $f_{(x)}^{'}$ 代替 $g_{(x)}$ ，得：
$x_{n+1}=x_{n}-\frac{f_{(x_{n})}^{'}}{f_{(x_{n})}^{''}}$

高级版本

无约束优化问题之梯度下降法、牛顿法原理
要求 $minf_{(x)}$ ，
先将 $f_{(x)}$ 进行泰勒公式展开，得：
$f_{(x)}=f_{(x_{n})}+f_{(x_{n})}^{'}(x-x_{n})+\frac{f_{(x_{n})}^{''}}{2!}(x-x_{n})^{2}+……$
将只保留有关 $x$ 的项，得：
$f_{(x)}=\frac{f_{(x_{n})}^{''}}{2}x^{2}+(f_{(x_{n})}^{'}-x_{n}f_{(x_{n})}^{''})x+……$
此时已将 $f_{(x)}$ 化为一个一元二次函数。
所以当此函数取最小值时的横坐标为：
$x_{n+1}=-\frac{b}{2a}=-\frac{f_{(x_{n})}^{'}-x_{n}f_{(x_{n})}^{''}}{f_{(x_{n})}^{''}}=x_{n}-\frac{f_{(x_{n})}^{'}}{f_{(x_{n})}^{''}}$