【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

假设陈述

  • h(x) : estimated probability that y = 1 on input x
    hθ(x)=P(y=1  x;θ)=g(θTx) \large h_\theta(x) = P(y=1\,|\,x;\theta)= g(\theta^Tx)
  • sigmod 函数
    g(z)=11+ez \large g(z) = \frac{1}{1+e^{-z}}
    【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

决策界限 Decision Boundary

y={1,hθ(x)0.5 or hθ(x)0 0,hθ(x)<0.5 or hθ(x)<0 \large y= \begin{cases} 1, & { h_\theta (x) \geq 0.5 \, or \, h_\theta(x) \geq 0 } \\\ 0, & h_\theta (x) < 0.5 \,or \, h_\theta(x) < 0 \end{cases}
【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

  • Non-Linear Decision Boundary
    【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

Logistic regression cost function

  • cost
    Cost(hθ(x(i),y(i)))={log(hθ(x))if y = 1log(1hθ(x))if y = 0 \large Cost(h_\theta(x^{(i)},y^{(i)})) = \begin{cases} -\log(h_\theta(x))\quad \text{if y = 1} \\ -\log(1-h_\theta(x))\quad \text{if y = 0} \end{cases}
  • simplify version

Cost(hθ(x(i),y(i)))=ylog(hθ(x))(1y)log(1hθ(x)) \large Cost(h_\theta(x^{(i)},y^{(i)})) = -y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x))

  • J(θ)J(\theta)

J(θ)=1mi=1mCost(hθ(x(i),y(i)))=1m[i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))] \large J(\theta)= -\frac{1}{m}\sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})) \\ = -\frac{1}{m}[\sum_{i=1}^my^{(i)} \log h_\theta (x^{(i)}) + (1-y^{(i)})\log (1-h_\theta (x^{(i)}))]

  • to fie parameters θ\theta

minθJ(θ) \large \min_\theta J(\theta)

高级优化

Optimization algorithms:

  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS

Advantages

  • No need to manually pick α\alpha
  • Often faster than gradient descent

Disadvantage

  • More complex

Multiclass classification

Train a Logistic regression classifier hθ(i)(x)h_\theta ^{(i)}(x) for each class ii to predict the probability that y=iy=i.
On a new input xx, to maake a prediction, pick the class ii that maximizes
maxihθ(i)(x) \large \max_i h_\theta^{(i)}(x)
【机器学习笔记】吴恩达公开课·第七章 Logistic 回归
hθ(i)(x)=P(y=i  x;θ)(i=1,2,3) h_\theta ^{(i)}(x) = P(y=i \,|\, x; \theta) \quad (i = 1, 2, 3)