假设陈述
-
h(x)
: estimated probability that y = 1
on input x
hθ(x)=P(y=1∣x;θ)=g(θTx)
- sigmod 函数
g(z)=1+e−z1
决策界限 Decision Boundary
y=⎩⎨⎧1, 0,hθ(x)≥0.5orhθ(x)≥0hθ(x)<0.5orhθ(x)<0
- Non-Linear Decision Boundary
Logistic regression cost function
- cost
Cost(hθ(x(i),y(i)))=⎩⎨⎧−log(hθ(x))if y = 1−log(1−hθ(x))if y = 0
- simplify version
Cost(hθ(x(i),y(i)))=−ylog(hθ(x))−(1−y)log(1−hθ(x))
J(θ)=−m1i=1∑mCost(hθ(x(i),y(i)))=−m1[i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]
- to fie parameters θ
θminJ(θ)
高级优化
Optimization algorithms:
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS
Advantages
- No need to manually pick α
- Often faster than gradient descent
Disadvantage
Multiclass classification
Train a Logistic regression classifier hθ(i)(x) for each class i to predict the probability that y=i.
On a new input x, to maake a prediction, pick the class i that maximizes
imaxhθ(i)(x)
hθ(i)(x)=P(y=i∣x;θ)(i=1,2,3)