ML学习笔记 (2)
Multivariate linear regression
more powerful one that works with multiple variables or with multiple features linear regression.
n = number of features
= input (features) of training example.
= value of feature j in training example.
hypothesis :
for convenience of notation , difine = 1
= [ ] [ ]
so : =
Feature scaling
Idea : Make sure features are on a similar scale
E.g. x1 = size(0~2000 feet^2) x2 = number of bedrooms(1~5)
may be a very skewed elliptical shape , gradient may end up taking a long time.
How to do : feature scaling
more generally :
get every feature into approximately a -1<= xi <=1 range.
Mean normalization
About α
Make sure gradient descent is working correctly.
the number of iterations that gradient descent takes to converge for a particular application can vary a lot.
Example automatic convergence test:
declare convergence if decreases by less than 10^-3 in one iteration.
but choosing what this threshold is is pretty difficult , and the plot can tell you gradient descent is working correctly or not.
polynomial regression
if choose the features likes this , feature scaling becomes increasingly important.
Normal equation :
Method to solve for analytically.
So :
in this way , feature scaling isn't actually necessary.
X transpose X is non-invertible
happen pretty rarely , in Octave this will do the right thing , use pinv even if X transpose X is non-invertible.
- see if you have redundant features like linearly dependent
- check if have too many features , use fewer features
- use regularization(正则化)
Logistic regression
Logistic Regression Model
--> Logistic / Sigmoid function
estimated probability that y = 1 on input x
decision boundary
suppose predict y=1 if
y=0 if
or even more complex examples……
fit the parameters theta
if use this costfunction , the plot will be the left,it is not guaranteed to converg to the global minimum.
a simpler way to write the cost function
Algorithm looks identical to linear regression , but hypothesis has changed.
So , this is actually not the same thing.
Advanced optimization algorithms and some advanced optimization concepts.
Gradient descent
Conjugate gradient
BFGS
L-BFGS
'GradObj' , 'on' : sets the gradient bojective parameter to on.
'MaxIter' : set the maximum gradient
'100' : iterations
initialTheta : initial guess for theta
@ : point to the cost function
exitFlag = 1 : convergence
Multi-class classification :
One-vs-all