Week5 Lasso Regression

1.Feature selection

1)All subsets

w^=(w1,w2...wD)

每一个特征都有可能包含或者不包含共有2D种情况,即穷举所有可能的model

2)Greedy algorithms

Forward stepwise algorithm

    从0特征开始,每次增加一个特征(保留之前的结果)
    D1
    (D1)1
    
    
    O(D2)
validation_set(数据量较小时需要cross validation)计算 ,直到 是停止算法。

3)Regularize

Ridge regression(L2regularized regression)

Totol cost=RSS(w^)+λ||w||22

  L2 encourage w^ to be small(close to but not 0)

Lasso regression(L1regularized regression)

Week5 Lasso Regression

Totol cost=RSS(w^)+λ||w||1

  L1 sparse w^ (some of w^ be exactly 0)
  
  【注意】lasso的||w||1是不含w0的,因为lasso是sparse(缩减有效w个数,即使一部分w为0),而我们不希望intercept也为0,故不含w0
  -----------------------------------
  由于L1 norm是不可导的,故Lasso不能像Ridge一样采用Gradient Decent算法,而应采用Subgradient Decent算法:
  Week5 Lasso Regression

Coodinate descent

1)feature matrix可以先normalize(归一化)

Week5 Lasso Regression
【注意】归一化针对每一个wj而言,即一列一列的归一化。归一化相当于将feature matrix / norms ; 对结果weights即w^而言相当于乘以了norms,这样才能保证

predictions=np.dot(FeatureMatrixweights)

2)Lasso的结果

Week5 Lasso Regression

3)Lasso的代码(Cyclical coordinate descent)

coodinate(指一次处理一个) descent是固定其他w参数而变更wi

更新wi的算法:

Week5 Lasso Regression

For each iteration:

1)As you loop over features in order and perform coordinate descent, measure how much each coordinate changes.

2)After the loop, if the maximum change across all coordinates is falls below the tolerance, stop. Otherwise, go back to step 1.

Week5 Lasso Regression

4)Lasso和Ridge的区别(待补充)

lasso是sparse会将一部分w舍弃(置为0,所以是feature selection),而ridge使w趋近0但不会为0

为什么lasso可以将w置0,而ridge不行?

从几何图形上可以理解;
Week5 Lasso Regression
如上图为两特征的w^=(w0w1)示例,solution处w0为0
Week5 Lasso Regression
ridge的solution为椭圆与圆相切的点,很明显不能使w0或者w1为0
以ridge为例,对某一个特定λ取solution的过程如下:
Week5 Lasso Regression

then,

Week5 Lasso Regression

所有可能的椭圆和圆相切的地方中存在使cost最小的点即solution解。对于不同的λ得到不同的解w^,之后再中通过test_data来评价