您的位置: 首页 > 文章 > week 2

week 2

分类: 文章 • 2024-01-16 09:09:34

1 multivariate linear regression（多元线性回归）
2 normal equation（正规方程）
- 2.1 concept
- 2.2 当矩阵不可逆时的处理方法
3 Octave / Matlab

1 multivariate linear regression（多元线性回归）

1.1 notation

m : the number of training example；
n: the number of features；
$x^{(i)}$ ：input (features) of $i^{(t h)}$ training example；
$x_{j}^{(} i)$ ：value of feature j in $i^{(t h)}$ training example；
$J (θ)$ ：cost function ；
$R^{n + 1}$ ：n+1维向量（加一是由于下角标从0开始）；

1.2 gradient descent in practice

1.2.1 feature scaling（特征缩放）

将各个特征值的范围缩放到接近于 $- 1 \leq x_{i} \leq 1$ 的区间上；
在同一数量级上为宜；
有利于提高 gradient descent 的收敛速度；

1.2.2 mean normalization（均值归一化）；

用 $x_{i} - u_{i}$ 取代 $x_{i}$ ，使各个特征值的均值为0；
$u_{i}$ 是该特征值的均值；
$x_{i} - u_{i}$ 除以该特征的范围（最大值 - 最小值）即可实现均值归一化；
该特征的范围也可使用标准差替代，这两种方式所得结果不相同；
不可应用于 $x_{0}$ ，因为 $x_{0} = 1$ ；

1.2.3 learning rate

$α$ ：learning rate；
若 $α$ 过大，则 cost function $J (θ)$ 会越过最小值点不断增大；
若 $α$ 过小，一定可收敛，但需耗费较长时间；
To choose $α$ , try
… , 0.001 , 0.003 , 0.01 , 0.03 , 0.1 , 0.3 , 1 , … （三倍速增加）

week 2

week 2

上图中的两种情况均由于 $α$ 过大引起；

1.3 选择特征

可将 $x_{1} 、 x_{2} 、 . . . 、 x_{n}$ 排列组合相城，构成新的 features ， e.g. $x_{1} x_{2}^{2}$ ；
选择新的 features ，注意使用 feature scaling ，使得各个 feature 范围接近于 $- 1 \leq x_{i} \leq 1$ 的区间上；

2 normal equation（正规方程）

2.1 concept

Normal equation：一种求解θ的解析解法，不再需要多次迭代求解θ，而是直接求解θ的最优值；
该方法不需要做 feature scaling ，不需要选择 learning rate ；
求 $J (θ)$ 对 $θ_{i}$ 的偏导，解得令偏导为0时的 $θ_{i}$ 值，即为 $J (θ)$ 最小时的 $θ$ 值；
结论： $θ = (X^{T} X)^{- 1} X^{T} y$ ，即可解得使 $J (θ)$ 最小的 $θ$ 值；
对于 linear regression 问题，normal equation 是一个很好的替代方法；
comparison

gradient descent	normal equation
need to choose $α$	no need to choose $α$
need many iterations	do not need to iterate
works well even when is large	slow if n is very large
$O (k n^{2})$	$O (n^{3})$ , need to compute inverse of $X^{T} X$

- 经验参考：若 n > 10000，则不再考虑 normal equation ；

2.2 当矩阵不可逆时的处理方法

当 $X^{T} X$ 为不可逆矩阵时：
point 1: redundant features （存在冗余特征）.
point 2: too many features (e.g.m ≤ n).
solution to point 2: delete some features, or use regularization.

3 Octave / Matlab