吴恩达机器学习 笔记整理 Chapter2 单变量线性回归

Main Procedure

吴恩达机器学习 笔记整理 Chapter2 单变量线性回归

hypothesis

a function that maps input to output

Univariate linear regression

hθ(x)=θ0+θ1xh_\theta(x) = \theta_0 + \theta_1x

Cost function

Idea

Choose θ\theta so that hθ(x)h_\theta(x) is close to y for our training examples (x,y).
To fit the function to training data => To minimized the cost function
To minimize the square difference between the hypothesis and the actual value

Cost function for Univariate linear regression

吴恩达机器学习 笔记整理 Chapter2 单变量线性回归

Gradient Descent

An algorithm for minimizing the cost function.

Intuition: contour plot for J(θ)J(\theta)

To minimize as quickly as possible

Outline

  1. Start with some θ0\theta_0, θ1\theta_1 (Random Initialize)
  2. Keep changing θ0\theta_0, θ1\theta_1 to reduce J(θ0,θ1)J(\theta_0, \theta_1) until end up at minimum (Although sensitive of local mini, usually get global mini)

Algorithm: To minimize as quickly as possible

Repeat util convergence{
吴恩达机器学习 笔记整理 Chapter2 单变量线性回归
}
NOTE: All the parameter θi\theta_i should be update simultaneously.

α\alpha: learning rate

Need to choose carefully.

  • Too small: gradient descent can be slow
  • Too large: it can be overshoot the minimum. It may be fail to converge, or even diverge

Gradient descent can converge to a local minimum, even with learning rate a fixed
As we approach a local minimum, gradient descent will automatically take smaller steps(because the partial derivative will become smaller). So there is no need to decrease α\alpha overtime.

Gradient Descent for Univariate Linear Regression

吴恩达机器学习 笔记整理 Chapter2 单变量线性回归
Problem: susceptible to local optima
It’s okay because the cost function for linear regression is a convex function

Batch Gradient Descent

Each step uses all the training examples.