逻辑斯蒂回归法二元分类

返回目录

预测值为0或者1的离散序列。
将 $\vec x$ 映射成0或者1，使用sigmoid函数进行模拟。
逻辑斯蒂回归法二元分类
假设函数：
$h(\vec{x}) =\frac{1}{ 1+e^{ -\vec{\theta}^T\vec{x}}}$
其中：
$\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ （n为特征个数） \end{aligned}$
即找到一系列参数 $\vec{\theta}$ 尽可能使得 $y=0$ 时 $h(\vec x)\rightarrow 0$ ， $y=1$ 时 $h(\vec x)\rightarrow 1$ 。
将 $h(\vec x)$ 视为 $h(\vec x)=1$ 的概率，则 $h(\vec x)$ 预测正确的概率为：
$p=h(\vec x)^y(1-h(\vec x))^{(1-y)}$
当 $y=0$ 时， $h(\vec x)$ 预测正确的概率（即 $h(\vec x)=0$ ）为 $1-h(\vec x)$ 。
当 $y=1$ 时， $h(\vec x)$ 预测正确的概率（即 $h(\vec x)=1$ ）为 $h(\vec x)$ 。
要使预测正确的概率最大，则对所有的测试数据满足：
$\begin{aligned} \max_{\vec{\theta}}l(\vec{\theta}) &= \max_{\vec{\theta}}(p^{(1)}\cdot p^{(2)}\cdot ...p^{(m)})\\ &= \max_{\vec{\theta}}\prod_{i=1}^{i=m} h(\vec x^{(i)})^{y^{(i)}}(1-h(\vec x^{(i)}))^{(1-y^{(i)})}\\ \end{aligned}$
两边取对数有：
$\begin{aligned} \max_{\vec{\theta}}L(\vec{\theta}) &= \max_{\vec{\theta}}ln(l(\vec{\theta})))\\ &= \max_{\vec{\theta}}\sum_{i=1}^{i=m} y^{(i)}ln(h(\vec x^{(i)}))+(1-y^{(i)})ln((1-h(\vec x^{(i)})))\\ \end{aligned}$
所以令代价函数 $J( \vec{\theta})=-\frac{1}{m}L(\vec{\theta})$ 。转化成求使 $J( \vec{\theta})$ 最小的 $\vec{\theta}$ 。
故代价函数：
$J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))$
其中：
$\begin{aligned} \vec{y}=[y^{(0)}, y^{(1)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ y^{(i)}\in\{0, 1\} （m为测试样本个数） \end{aligned}$
代价函数还可以做如下解释：
当 $y=0$ 时， $h(\vec x)=1$ 的代价趋于无穷， $h(\vec x)=0$ 的代价为零。
当 $y=1$ 时， $h(\vec x)=0$ 的代价趋于无穷， $h(\vec x)=1$ 的代价为零。
梯度下降法：
$\begin{aligned} \vec{\theta}_j&:=\vec{\theta}_j-\alpha\frac{\partial J(\vec{\theta})}{\partial \theta_j} \\ \end{aligned}$
$\begin{aligned} \frac{\partial J(\vec{\theta})}{\partial \theta_j} &= -\frac{1}{m}\sum_{i=1}^{i=m}(y^{(i)}\frac{(h(\vec{x}^{(i)}))^{'}}{h(\vec{x}^{(i)})}+(1-y^{(i)})\frac{-(h(\vec{x}^{(i)}))^{'}}{1-h(\vec{x}^{(i)})}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{y^{(i)}}{h(\vec{x}^{(i)})}-\frac{1-y^{(i)}}{1-h(\vec{x}^{(i)})})(h(\vec{x}^{(i)}))^{'}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})(y^{(i)}e^{-\vec{\theta}^T\vec{x}^{(i)}}+y^{(i)}-1)}{e^{-\vec{\theta}^T\vec{x}^{(i)}}})(\frac{e^{-\vec{\theta}^T\vec{x}^{(i)}}x_j^{(i)}}{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})^2})) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(\frac{x_j^{(i)}-x_j^{(i)}y^{(i)}(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})}{1+e^{-\vec{\theta}^T\vec{x}^{(i)}}}) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)} \\ \end{aligned}$ ∂θj∂J(θ)=−m1i=1∑i=m(y(i)h(x(i))(h(x(i)))′+(1−y(i))1−h(x(i))−(h(x(i)))′)=−m1i=1∑i=m((h(x(i))y(i)−1−h(x(i))1−y(i))(h(x(i)))′)=−m1i=1∑i=m((e−θTx(i)(1+e−θTx(i))(y(i)e−θTx(i)+y(i)−1))((1+e−θTx(i))2e−θTx(i)xj(i)))=m1i=1∑i=m(1+e−θTx(i)xj(i)−xj(i)y(i)(1+e−θTx(i)))=m1i=1∑i=m(h(x(i))−y(i))xj(i)

返回目录

逻辑斯蒂回归法二元分类

相关推荐