【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

假设陈述

h(x) : estimated probability that y = 1 on input x
$\large h_\theta(x) = P(y=1\,|\,x;\theta)= g(\theta^Tx)$
sigmod 函数
$\large g(z) = \frac{1}{1+e^{-z}}$

决策界限 Decision Boundary

$\large y= \begin{cases} 1, & { h_\theta (x) \geq 0.5 \, or \, h_\theta(x) \geq 0 } \\\ 0, & h_\theta (x) < 0.5 \,or \, h_\theta(x) < 0 \end{cases}$
【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

Non-Linear Decision Boundary

Logistic regression cost function

cost
$\large Cost(h_\theta(x^{(i)},y^{(i)})) = \begin{cases} -\log(h_\theta(x))\quad \text{if y = 1} \\ -\log(1-h_\theta(x))\quad \text{if y = 0} \end{cases}$
simplify version

$\large Cost(h_\theta(x^{(i)},y^{(i)})) = -y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x))$

$J(\theta)$

$\large J(\theta)= -\frac{1}{m}\sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})) \\ = -\frac{1}{m}[\sum_{i=1}^my^{(i)} \log h_\theta (x^{(i)}) + (1-y^{(i)})\log (1-h_\theta (x^{(i)}))]$

to fie parameters $\theta$

$\large \min_\theta J(\theta)$

高级优化

Optimization algorithms:

Gradient descent
Conjugate gradient
BFGS
L-BFGS

Advantages

No need to manually pick $\alpha$
Often faster than gradient descent

Disadvantage

More complex

Multiclass classification

Train a Logistic regression classifier $h_\theta ^{(i)}(x)$ for each class $i$ to predict the probability that $y=i$ .
On a new input $x$ , to maake a prediction, pick the class $i$ that maximizes
$\large \max_i h_\theta^{(i)}(x)$
【机器学习笔记】吴恩达公开课·第七章 Logistic 回归
$h_\theta ^{(i)}(x) = P(y=i \,|\, x; \theta) \quad (i = 1, 2, 3)$