吴恩达机器学习课程代码笔记02

代码

在第四节课Linear Regression with Multiple Variables中，实现normal equation的一行代码。代码本身没啥难度，主要难点在于公式怎么推导出来的。

%----X = pinv(A),求矩阵 A 的广义逆矩阵
%----不使用 inv() 的原因在于不确定矩阵 A 是否为方阵
pinv(X'*x)*x'*y

normal equation的推导

在吴恩达老师的机器学习课程中，直接就给出了如下normal equation，无推导过程，像博主这种菜鸡直接懵逼，于是赶紧上网学习了一番，内化（自我感觉…）了三个推导方法后，分享于此，如有错误，还望指正。
$\theta = (X^{T}X)^{-1}X^{T}y$

方法一：矩阵求导法

这个方法来源于吴恩达老师的CS229 machine learning中的课程PDF资料，这门课程与Coursera上的同名课程（也就是大部分人看的公开课版本）有所不同，需要更高的数学水平，更注重公式的推导。

吴恩达机器学习课程代码笔记02
推导过程如下：
$\bigtriangledown _{\theta} J(\theta)=\bigtriangledown_{\theta}\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^{2} =\bigtriangledown_{\theta}\frac{1}{2}(X\theta-\vec y)^{T}(X\theta-\vec y)$
$\because (A\pm B)^{T}=A^T+B^T\quad and\quad (A\times B)^T=B^T\times A^T \\ \therefore \bigtriangledown _{\theta} J(\theta)=\frac{1}{2}\bigtriangledown_{\theta}(\theta^TX^TX\theta-\theta^TX^T\vec y-\vec y^TX\theta+\vec y^T\vec y) \\ \because 如果\alpha是一个实数，则tr(\alpha)=\alpha \\又\because \theta是(n\times 1)的矩阵，X是(m\times n)的矩阵，\vec y是(m\times 1)的向量 \\ \therefore \theta^TX^TX\theta的维数变换(1\times n)\times(n\times m)\times(m\times n)\times(n\times 1)得到一个(1\times 1)的方阵，即一个实数 \\ 同理，\theta^TX^T\vec y，\vec y^TX\theta ，\vec y^T\vec y维数变换后得到的都是一个实数 \\ \because \vec y^T\vec y是一个与\theta无关的常量，对\theta求偏导为0，则在式子中可以省略掉 \\ \therefore \bigtriangledown _{\theta} J(\theta)=\frac{1}{2}\bigtriangledown_{\theta}tr(\theta^TX^TX\theta-\theta^TX^T\vec y-\vec y^TX\theta) \\ \because tr(A\pm B)=A\pm B \quad and \quad tr(A)=tr(A^T) \\ \therefore tr(\theta^TX^T\vec y)=tr((X\theta)^T\vec y)=tr(\vec yX\theta) \\ \therefore \bigtriangledown _{\theta} J(\theta)=\frac{1}{2}\bigtriangledown_{\theta}(tr(\theta^TX^TX\theta)-2tr(\vec y^TX\theta)) \\ \because tr内还是实数，则去掉tr对式子没影响 \\ \therefore \bigtriangledown _{\theta} J(\theta)=\frac{1}{2}\bigtriangledown_{\theta}(\theta^TX^TX\theta-2\vec y^TX\theta) \\ \because 矩阵求导中，\frac{\partial \vec b^TAX}{\partial X}=A^T\vec b \quad and \quad \frac{\partial X^TAX}{\partial X}=(A+A^T)X \\ \therefore \bigtriangledown _{\theta} J(\theta)=\frac{1}{2}(2X^TX\theta-2X^T\vec y)=X^TX\theta-X^T\vec y \\ 令\bigtriangledown _{\theta} J(\theta)=0，则可以得到\theta = (X^{T}X)^{-1}X^{T}y \\ \quad$ ∵(A±B)T=AT+BTand(A×B)T=BT×AT∴▽θJ(θ)=21▽θ(θTXTXθ−θTXTy−yTXθ+yTy)∵如果α是一个实数，则tr(α)=α又∵θ是(n×1)的矩阵，X是(m×n)的矩阵，y是(m×1)的向量∴θTXTXθ的维数变换(1×n)×(n×m)×(m×n)×(n×1)得到一个(1×1)的方阵，即一个实数同理，θTXTy，yTXθ，yTy维数变换后得到的都是一个实数∵yTy是一个与θ无关的常量，对θ求偏导为0，则在式子中可以省略掉∴▽θJ(θ)=21▽θtr(θTXTXθ−θTXTy−yTXθ)∵tr(A±B)=A±Bandtr(A)=tr(AT)∴tr(θTXTy)=tr((Xθ)Ty)=tr(yXθ)∴▽θJ(θ)=21▽θ(tr(θTXTXθ)−2tr(yTXθ))∵tr内还是实数，则去掉tr对式子没影响∴▽θJ(θ)=21▽θ(θTXTXθ−2yTXθ)∵矩阵求导中，∂X∂bTAX=ATband∂X∂XTAX=(A+AT)X∴▽θJ(θ)=21(2XTXθ−2XTy)=XTXθ−XTy令▽θJ(θ)=0，则可以得到θ=(XTX)−1XTy

方法二：演绎法

这个方法比较简单易懂，整体思想就是从一条式子演绎展开为矩阵。先做如下假设：
吴恩达机器学习课程代码笔记02
损失函数为：
$J(\theta_0,\theta_1,\theta_2...,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^{2}$
我们的目标是在不同的组合下，损失函数值能达到最小。所以通过对每个求偏导，求出对应偏导数等于0的每个值，这就是最终的组合。求偏导的过程如下：
吴恩达机器学习课程代码笔记02
$以\theta_0的公式为例： \\ \frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x_0^{(i)}=0 \\ 转换为矩阵运算：$

将第二部分展开：

将（5）带入式子，得：

将（2）带入式子，得：

同理，

通过矩阵将这些式子结合在一起，
吴恩达机器学习课程代码笔记02
$舍去\frac{1}{m}，然后因为$

所以上述（6）化简为：
$X^T(X\theta-y)=0 \\X^TX\theta-X^Ty=0 \\ \therefore \theta=(X^{T}X)^{-1}X^{T}y \\ \quad$

方法三：链式求导法

这个属于懒人必备的一个方法：
$\frac{\partial J(\theta)}{\partial \theta}=\frac{1}{2}\frac{\partial (X\theta-\vec y)^{T}(X\theta-\vec y)}{\partial \theta}=\frac{1}{2}\frac{\partial (X\theta-\vec y)^{T}(X\theta-\vec y)}{\partial (X\theta-\vec y)}\frac{\partial (X\theta-\vec y)}{\partial \theta}=(X\theta-\vec y)^TX=0 \\ \therefore \theta^TX^TX=\vec y^TX \\ 又\because式子两边维数变换得到的是一个实数 \\ \therefore tr(\theta^TX^TX)=tr(\vec y^TX) \\ tr(X^TX\theta)=tr(X^T\vec y) \\ X^TX\theta=X^T\vec y \\ \therefore \theta=(X^{T}X)^{-1}X^{T}y$

Reference

https://www.cnblogs.com/AngelaSunny/p/6616712.html
https://www.cnblogs.com/crackpotisback/p/5545708.html
https://blog.****.net/zhangbaodan1/article/details/81013056
https://blog.****.net/daaikuaichuan/article/details/80620518

吴恩达机器学习课程代码笔记02

吴恩达机器学习课程代码笔记02

代码

normal equation的推导

方法一：矩阵求导法

方法二：演绎法

方法三：链式求导法

Reference

相关推荐