ML学习笔记 (2)

Multivariate linear regression

more powerful one that works with multiple variables or with multiple features linear regression.

n = number of features

ML学习笔记 (2) = input (features) of ML学习笔记 (2) training example.

ML学习笔记 (2) = value of feature j in ML学习笔记 (2) training example.

hypothesis : ML学习笔记 (2) ML学习笔记 (2)

for convenience of notation , difine ML学习笔记 (2) = 1

ML学习笔记 (2) = [ ML学习笔记 (2) ]         ML学习笔记 (2) [ ML学习笔记 (2) ]

so  :  ML学习笔记 (2) = ML学习笔记 (2)

Feature scaling

Idea : Make sure features are on a similar scale

E.g.   x1 = size(0~2000 feet^2)      x2 = number of bedrooms(1~5)

may be a very skewed elliptical shape , gradient may end up taking a long time.

ML学习笔记 (2)

How to do : feature scaling

ML学习笔记 (2)       ML学习笔记 (2)

ML学习笔记 (2)

 more generally :

get every feature into approximately a   -1<= xi <=1   range.

Mean normalization

ML学习笔记 (2)

About α

Make sure gradient descent is working correctly.

ML学习笔记 (2)

the number of iterations that gradient descent takes to converge for a particular application can vary a lot.

Example automatic convergence test:

declare convergence if  ML学习笔记 (2)  decreases by less than 10^-3 in one iteration.

but choosing what this threshold is is pretty difficult , and the plot can tell you gradient descent is working correctly or not.

ML学习笔记 (2)

polynomial regression 

ML学习笔记 (2)

ML学习笔记 (2)

if  choose the features likes this , feature scaling becomes increasingly important.

Normal equation :

Method to solve for ML学习笔记 (2) analytically.

ML学习笔记 (2)

ML学习笔记 (2)

So :     ML学习笔记 (2)

in this way , feature scaling isn't actually necessary.

ML学习笔记 (2)

X transpose X is non-invertible 

happen pretty rarely , in Octave this will do the right thing , use pinv even if X transpose X is non-invertible.

  1. see if you have redundant features like linearly dependent
  2. check if have too many features , use fewer features
  3. use regularization(正则化)

Logistic regression

ML学习笔记 (2)

ML学习笔记 (2)

ML学习笔记 (2)

Logistic Regression Model

ML学习笔记 (2)             --> Logistic / Sigmoid function

ML学习笔记 (2)

ML学习笔记 (2)

ML学习笔记 (2)  estimated probability that y = 1 on input x

ML学习笔记 (2)

decision boundary

suppose predict y=1 if  ML学习笔记 (2)      ML学习笔记 (2)

                           y=0 if  ML学习笔记 (2)       ML学习笔记 (2)

ML学习笔记 (2)

ML学习笔记 (2)

or even more complex examples……

fit the parameters theta

ML学习笔记 (2)

ML学习笔记 (2)

if  use this costfunction , the plot will be the left,it is not guaranteed to converg to the global minimum.

ML学习笔记 (2)

ML学习笔记 (2)

ML学习笔记 (2)ML学习笔记 (2)

a simpler way to write the cost function

ML学习笔记 (2)

ML学习笔记 (2)

 Algorithm looks identical to linear regression , but hypothesis has changed.

So , this is actually not the same thing.

Advanced optimization algorithms and some advanced optimization concepts.

Gradient descent

Conjugate gradient

BFGS

L-BFGS

ML学习笔记 (2)

ML学习笔记 (2)

ML学习笔记 (2)

'GradObj' , 'on' : sets the gradient bojective parameter to on.

'MaxIter'  : set the maximum gradient

'100'  : iterations 

initialTheta : initial guess for theta

@ : point to the cost function

ML学习笔记 (2)

exitFlag = 1 : convergence

Multi-class classification :

One-vs-all

ML学习笔记 (2)

ML学习笔记 (2)