机器学习系列之coursera week 2 Linear Regression with Multiple Variables
目录
1. Multiple Features
1.1 Multiple features
1.2 Gradient descent for multiple varibales
1.3 Gradient descent in practice I:Feature scaling
1.4 Gradient descent in practice II: Learning rate
1.5 Summary
1.6 Features and polynomial regression
2. Computing Parameters Analytically
2.1 Normal Equation
2.2 Normal Equation Nonivertibility
1. Multiple Features
1.1 Multiple features
Size |
Number of bedrooms |
Number of floors |
Age of home |
Price |
2104 |
5 |
1 |
45 |
460 |
1416 |
3 |
2 |
40 |
232 |
1534 |
3 |
2 |
30 |
315 |
852 |
2 |
1 |
36 |
178 |
X1 |
X2 |
X3 |
X4 |
y |
Notation:
n = number of features
x^(i) = ith training example
x^(i)_(j) = value of feature j in ith training example
E.g:
x^(2) = [1416; 3; 2; 40]
Hypothesis:
vectorization:
令x0 = 1
这就叫做多元线性回归
1.2 Gradient descent for multiple varibales
Hypothesis:
parameters:
cost function:
Gradient descent:
simultaneously updata for every j=0,1...n
即
1.3 Gradient descent in practice I:Feature scaling
Feature scaling 特征缩放:
Idea: Make sure features are on a similar scalar
这样梯度下降就能收敛更快
E.g. x1 = size (0~2000)
x2 = number of bedrooms (1~5)
引自coursera machine learning week 2 Gradient descent in practice I: Feature scaling
feature scaling:
More generally:
frature scaling:
get every feature into approximately a
x0 = 1可以
0 <= x1 <= 3 非常接近 可以
-100 <= x2 <= 100 须scaling
-0.0001 <= x3 <= 0.0001 须scaling
一般地,一个特征在
-3 to 3
- 1/3 to 1/3 都是可以的
另一种缩放叫归一化
1.4 Gradient descent in practice II: Learning rate
Gradient descent:
- "Debuggung": How to make sure gradient descent is working correctly
- How to choose learning rate
making sure gradient descent is working correctly
plot J(θ) - No. of iterations
引自coursera machine learning week 2 Gradient descent in practice II: Learning rate
J(θ) should decrease after every iteration
还能判断是否收敛
Example automatic converage test:
declare convergence if J(θ) decreases by less than 10^(-3) in one iteration
最好看图判断,因为阈值很难选取
如果收敛图是下面这样:
引自coursera machine learning week 2 Gradient descent in practice II: Learning rate
- For sufficiently small α, J(θ) should decrease on every iteration ------ hold true for linear regression
- But if α is too small, gradient descent can be slow converge
1.5 Summary
- If α is too small: slow convergence
- If α is too large: J(θ) may not decrease on every iteration; may not converge. (slow converge also possible)
to choose α, try:
引自coursera machine learning week 2 Gradient descent in practice II: Learning rate
1.6 Features and polynomial regression
2. Computing Parameters Analytically
2.1 Normal Equation
A method to solve for θ analytically.(求解析解)
E.g.
J(θ) = aθ^2 + bθ +c, θ belongs to R
J(min) = J(-2a/b)
引自coursera machine learning week 2: Normal Equation
Normal Equation:
note: 不需要feature scaling
m training examples, n features
Gradient descent |
Normal Equation |
Need to choose α |
No need to choose α |
Need many iteration |
Don’t need to iteration |
works well even when n is large O(n^2) |
Need to compute inv(a) O(n^3) |
|
slow if n is very large |
n > 10000, 开始用梯度下降
2.2 Normal Equation Nonivertibility
如果不可逆???
当1. redundant feature(linearly dependent)
E.g.
x1 = size in feet^2
x2 = size in m^2
而1m = 3.28feet
2. too many features(e.g. m<=n)
时会出现不可逆,实际上当且仅当X的行向量线性无关时,
才可逆。
解决方法是删除redundant features, 多余features,or use regularization.
到这里, 第二周的内容已经复习完,其中1.6多项式回归笔者以后会通过专题补充。如果想更多的了解正规方程,请看传送门:MIT线性代数
https://www.bilibili.com/video/av6951511/?p=16