逻辑斯蒂回归法二元分类

返回目录

预测值为0或者1的离散序列。
x\vec x映射成0或者1,使用sigmoid函数进行模拟。
逻辑斯蒂回归法二元分类
假设函数:
h(x)=11+eθTx h(\vec{x}) =\frac{1}{ 1+e^{ -\vec{\theta}^T\vec{x}}}
其中:
x=[x0,x1,...,xn]TR(n+1)×1θ=[θ0,θ1,...,θn]TR(n+1)×1n \begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ (n为特征个数) \end{aligned}
即找到一系列参数θ\vec{\theta}尽可能使得y=0y=0h(x)0h(\vec x)\rightarrow 0y=1y=1h(x)1h(\vec x)\rightarrow 1
h(x)h(\vec x)视为h(x)=1h(\vec x)=1的概率,则h(x)h(\vec x)预测正确的概率为:
p=h(x)y(1h(x))(1y) p=h(\vec x)^y(1-h(\vec x))^{(1-y)}
y=0y=0时,h(x)h(\vec x)预测正确的概率(即h(x)=0h(\vec x)=0)为1h(x)1-h(\vec x)
y=1y=1时,h(x)h(\vec x)预测正确的概率(即h(x)=1h(\vec x)=1)为h(x)h(\vec x)
要使预测正确的概率最大,则对所有的测试数据满足:
maxθl(θ)=maxθ(p(1)p(2)...p(m))=maxθi=1i=mh(x(i))y(i)(1h(x(i)))(1y(i)) \begin{aligned} \max_{\vec{\theta}}l(\vec{\theta}) &= \max_{\vec{\theta}}(p^{(1)}\cdot p^{(2)}\cdot ...p^{(m)})\\ &= \max_{\vec{\theta}}\prod_{i=1}^{i=m} h(\vec x^{(i)})^{y^{(i)}}(1-h(\vec x^{(i)}))^{(1-y^{(i)})}\\ \end{aligned}
两边取对数有:
maxθL(θ)=maxθln(l(θ)))=maxθi=1i=my(i)ln(h(x(i)))+(1y(i))ln((1h(x(i)))) \begin{aligned} \max_{\vec{\theta}}L(\vec{\theta}) &= \max_{\vec{\theta}}ln(l(\vec{\theta})))\\ &= \max_{\vec{\theta}}\sum_{i=1}^{i=m} y^{(i)}ln(h(\vec x^{(i)}))+(1-y^{(i)})ln((1-h(\vec x^{(i)})))\\ \end{aligned}
所以令代价函数 J(θ)=1mL(θ)J( \vec{\theta})=-\frac{1}{m}L(\vec{\theta})。转化成求使J(θ)J( \vec{\theta})最小的θ\vec{\theta}
故代价函数:
J(θ)=1m(i=1i=my(i)ln(h(x(i)))+(1y(i))ln(1h(x(i)))) J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))
其中:
y=[y(0),y(1),...,y(m)]TR(m×1)y(i){0,1}m \begin{aligned} \vec{y}=[y^{(0)}, y^{(1)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ y^{(i)}\in\{0, 1\} (m为测试样本个数) \end{aligned}
代价函数还可以做如下解释:
y=0y=0时,h(x)=1h(\vec x)=1的代价趋于无穷,h(x)=0h(\vec x)=0的代价为零。
y=1y=1时,h(x)=0h(\vec x)=0的代价趋于无穷,h(x)=1h(\vec x)=1的代价为零。
梯度下降法:
θj:=θjαJ(θ)θj \begin{aligned} \vec{\theta}_j&:=\vec{\theta}_j-\alpha\frac{\partial J(\vec{\theta})}{\partial \theta_j} \\ \end{aligned}
J(θ)θj=1mi=1i=m(y(i)(h(x(i)))h(x(i))+(1y(i))(h(x(i)))1h(x(i)))=1mi=1i=m((y(i)h(x(i))1y(i)1h(x(i)))(h(x(i))))=1mi=1i=m(((1+eθTx(i))(y(i)eθTx(i)+y(i)1)eθTx(i))(eθTx(i)xj(i)(1+eθTx(i))2))=1mi=1i=m(xj(i)xj(i)y(i)(1+eθTx(i))1+eθTx(i))=1mi=1i=m(h(x(i))y(i))xj(i) \begin{aligned} \frac{\partial J(\vec{\theta})}{\partial \theta_j} &= -\frac{1}{m}\sum_{i=1}^{i=m}(y^{(i)}\frac{(h(\vec{x}^{(i)}))^{'}}{h(\vec{x}^{(i)})}+(1-y^{(i)})\frac{-(h(\vec{x}^{(i)}))^{'}}{1-h(\vec{x}^{(i)})}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{y^{(i)}}{h(\vec{x}^{(i)})}-\frac{1-y^{(i)}}{1-h(\vec{x}^{(i)})})(h(\vec{x}^{(i)}))^{'}) \\ &= -\frac{1}{m}\sum_{i=1}^{i=m}((\frac{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})(y^{(i)}e^{-\vec{\theta}^T\vec{x}^{(i)}}+y^{(i)}-1)}{e^{-\vec{\theta}^T\vec{x}^{(i)}}})(\frac{e^{-\vec{\theta}^T\vec{x}^{(i)}}x_j^{(i)}}{(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})^2})) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(\frac{x_j^{(i)}-x_j^{(i)}y^{(i)}(1+e^{-\vec{\theta}^T\vec{x}^{(i)}})}{1+e^{-\vec{\theta}^T\vec{x}^{(i)}}}) \\ &= \frac{1}{m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)} \\ \end{aligned}

返回目录