CS 229 notes Supervised Learning

CS 229 notes Supervised Learning

标签(空格分隔): 监督学习 线性代数


Forword

the proof of Normal equation and, before that, some linear algebra equations, which will be used in the proof.

The normal equation

Linear algebra preparation

For two matrices CS 229 notes Supervised Learning and CS 229 notes Supervised Learning such that CS 229 notes Supervised Learning is square, CS 229 notes Supervised Learning.

Proof:

 CS 229 notes Supervised Learning

CS 229 notes Supervised Learning

 

Some properties:
CS 229 notes Supervised Learning

 

some facts of matrix derivative:
CS 229 notes Supervised Learning

Proof:

 CS 229 notes Supervised Learning

CS 229 notes Supervised Learning
CS 229 notes Supervised Learning

Proof 1:

CS 229 notes Supervised Learning

 

Proof 2:

CS 229 notes Supervised Learning

 

CS 229 notes Supervised Learning

Proof: CS 229 notes Supervised Learning
(CS 229 notes Supervised Learning refers to the cofactor)

Least squares revisited

CS 229 notes Supervised Learning(if we don’t include the intercept term)

CS 229 notes Supervised Learning

since CS 229 notes Supervised Learning,

CS 229 notes Supervised Learning

Thus,
$\frac{1}{2}(X\theta-\vec{y})^T(X\theta-\vec{y}) =
\frac{1}{2}\displaystyle{\sum{i=1}^{m}(h\theta(x^{(i)}) -y^{(i)})^2} = J(\theta) $.

Combine Equations CS 229 notes Supervised Learning
CS 229 notes Supervised Learning

Hence

CS 229 notes Supervised Learning

Notice it is a real number, or you can see it as a CS 229 notes Supervised Learning matrix, so
CS 229 notes Supervised Learning

 


since CS 229 notes Supervised Learning and CS 229 notes Supervised Learning involves no CS 229 notes Supervised Learning elements.
then use equation CS 229 notes Supervised Learning with CS 229 notes Supervised Learning

CS 229 notes Supervised Learning

 


To minmize CS 229 notes Supervised Learning, we set its derivative to zero, and obtain the normal equation:
CS 229 notes Supervised Learning
CS 229 notes Supervised Learning