机器学习基础（二）——LogisticRegression

假设有如下数据集代表 $y = (0, 1)$ 两类不同数据

以 $P (\hat{y} = 1 | x, w)$ 来表示 $\hat{y} = 1$ 的概率,而由于y的取值为0或1，有 $P (\hat{y} = 1 | x, w) + P (\hat{y} = 0 | x, w) = 1$ ，Logistic regression方法用线性方程加Sigmoid函数限制( $0 \leq P \leq 1$ )的方式建立模型:

h_{w} (x) = g (z)

g (z) = \frac{1}{1 + e^{- z}}

z = w^{T} x

然后使用阈值(threshold)来判断具体分类结果:

y = {\begin{cases} 0 h_{w} (x) < t h r e s h o l d \\ 1 h_{w} (x) \geq t h r e s h o l d \end{cases}

计算损失函数如下：

l o s s = 、 f r a c 1 m \sum_{i = 1}^{m} \frac{1}{2} (h_{w} (x^{(i)} - y^{(i)}))^{2}

C o s t (h_{w} (x), y) = \frac{1}{2} (h_{w} (x) - y)^{2}

但是显然由于Sigmoid函数的影响损失函数是“非凸”的，所以需要通过变形来得到一个合适的形式：

C o s t (h_{w} (x), y) = {\begin{cases} - l o g (h_{w} (x)) y = 1 \\ - l o g (1 - h_{w} (x)) y = 0 \end{cases}

分析Cost函数：
* 当 $y = 1, h_{w} (x) = 1$ 时 $C o s t = 0$ ,即 $P (y = 1 | w, x) = 1$ 时预测 $y = 1$ 的准确度极高。
* 当 $y = 1, h_{w} (x) = 0$ 时 $C o s t \to \infty$ ,即 $P (y = 1 | w, x) = 0$ 时预测 $y = 1$ 的准确度极低。
* 当 $y = 0$ 时的情况相同。

所以使用如下Cost形式：

C o s t (h_{w} (x), y) = - y l o g (h_{w} (x)) - (1 - y) l o g (1 - h_{w} (x))

需要求解的优化问题为：

m i n l o s s (w) = \frac{1}{m} \sum_{(i = 1)} m C o s t (h_{w} (x), y)

\frac{d}{d w_{j}} (C o s t (w)) = - y \frac{1}{h_{w} (x)} h_{w} (x) (1 - h_{w} (x)) x_{j} - (1 - y) \frac{- h_{w} (x)}{1 - h_{w} (x)} (1 - h_{w} (x)) x_{j}

= (h_{w} (x) - y) x_{j}

\frac{d}{d w_{j}} (l o s s (w)) = \frac{1}{m} \sum_{i = 1}^{m} (h_{w} (x^{(i)}) - y^{(i)}) x_{j}^{(i)}

以上为逻辑回归算法的基本思路，更详尽描述待补充

基于numpy的具体实现代码见：https://github.com/Alnlll/ML/tree/master/lgr

机器学习基础（二）——LogisticRegression

相关推荐