您的位置: 首页 > 文章 > ML学习笔记（2）

ML学习笔记（2）

分类: 文章 • 2023-11-23 11:13:52

Multivariate linear regression

more powerful one that works with multiple variables or with multiple features linear regression.

n = number of features

ML学习笔记（2） = input (features) of training example.

ML学习笔记（2） = value of feature j in training example.

hypothesis : ML学习笔记（2）

for convenience of notation , difine ML学习笔记（2） = 1

ML学习笔记（2） = [ ] [ ]

so : ML学习笔记（2） =

Feature scaling

Idea : Make sure features are on a similar scale

E.g. x1 = size(0~2000 feet^2) x2 = number of bedrooms(1~5)

may be a very skewed elliptical shape , gradient may end up taking a long time.

ML学习笔记（2）

How to do : feature scaling

ML学习笔记（2）

ML学习笔记（2）

more generally :

get every feature into approximately a -1<= xi <=1 range.

Mean normalization

ML学习笔记（2）

About α

Make sure gradient descent is working correctly.

ML学习笔记（2）

the number of iterations that gradient descent takes to converge for a particular application can vary a lot.

Example automatic convergence test:

declare convergence if ML学习笔记（2） decreases by less than 10^-3 in one iteration.

but choosing what this threshold is is pretty difficult , and the plot can tell you gradient descent is working correctly or not.

ML学习笔记（2）

polynomial regression

ML学习笔记（2）

ML学习笔记（2）

if choose the features likes this , feature scaling becomes increasingly important.

Normal equation :

Method to solve for ML学习笔记（2） analytically.

ML学习笔记（2）

ML学习笔记（2）

So : ML学习笔记（2）

in this way , feature scaling isn't actually necessary.

ML学习笔记（2）

X transpose X is non-invertible

happen pretty rarely , in Octave this will do the right thing , use pinv even if X transpose X is non-invertible.

see if you have redundant features like linearly dependent
check if have too many features , use fewer features
use regularization（正则化）

Logistic regression

ML学习笔记（2）

ML学习笔记（2）

ML学习笔记（2）

Logistic Regression Model

ML学习笔记（2） --> Logistic / Sigmoid function

ML学习笔记（2）

ML学习笔记（2）

ML学习笔记（2） estimated probability that y = 1 on input x

ML学习笔记（2）

decision boundary

suppose predict y=1 if ML学习笔记（2）

y=0 if ML学习笔记（2）

ML学习笔记（2）

ML学习笔记（2）

or even more complex examples……

fit the parameters theta

ML学习笔记（2）

ML学习笔记（2）

if use this costfunction , the plot will be the left,it is not guaranteed to converg to the global minimum.

ML学习笔记（2）

ML学习笔记（2）

ML学习笔记（2）

a simpler way to write the cost function

ML学习笔记（2）

ML学习笔记（2）

Algorithm looks identical to linear regression , but hypothesis has changed.

So , this is actually not the same thing.

Advanced optimization algorithms and some advanced optimization concepts.

Gradient descent

Conjugate gradient

BFGS

L-BFGS

ML学习笔记（2）

ML学习笔记（2）

ML学习笔记（2）

'GradObj' , 'on' : sets the gradient bojective parameter to on.

'MaxIter' : set the maximum gradient

'100' : iterations

initialTheta : initial guess for theta

@ : point to the cost function

ML学习笔记（2）

exitFlag = 1 : convergence

Multi-class classification :

One-vs-all

ML学习笔记（2）

ML学习笔记（2）