关于归一化和标准化的解释

归一化

就是将训练集中某一列数值特征(假设是第i列)的值缩放到0和1之间

标准化

就是将训练集中某一列数值特征(假设是第i列)的值缩放成均值为0,方差为1的状态


他们其实都是特征缩放的方法
缩放过程可以分为以下几种:
缩放到均值为0,方差为1(Standardization——StandardScaler())
缩放到0和1之间(Standardization——MinMaxScaler())
缩放到-1和1之间(Standardization——MaxAbsScaler())
缩放到0和1之间,保留原始数据的分布(Normalization——Normalizer())


对于标准化和正则化这篇文章解释的很好
"标准化"和"归一化"主要指代这四种特征缩放方式
关于归一化和标准化的解释

Rescaling (min-max normalization) 一般叫做归一化
Z-score normalization 一般叫做标准化


Normalization

Normalization usually rescales features to [0,1]
关于归一化和标准化的解释
It will be useful when we are sure enough that there are no anomalies (i.e. outliers) with extremely large or small values. For example, in a recommender system, the ratings made by users are limited to a small finite set like {1,2,3,4,5}.
In some situations, we may prefer to map data to a range like [−1,1] with zero-mean.2 Then we should choose mean normalization.关于归一化和标准化的解释
In this way, it will be more convenient for us to use other techniques like matrix factorization.

Regularization

Different from the feature scaling techniques mentioned above, regularization is intended to solve the overfitting problem. By adding an extra part to the loss function, the parameters in learning algorithms are more likely to converge to smaller values, which can significantly reduce overfitting.

There are mainly two basic types of regularization: L1-norm (lasso) and L2-norm (ridge regression).

L1-norm

The original loss function is denoted by f(x), and the new one is F(x)
关于归一化和标准化的解释
where
关于归一化和标准化的解释
L1 regularization is better when we want to train a sparse model, since the absolute value function is not differentiable at 0.

L2-norm

关于归一化和标准化的解释
L2 regularization is preferred in ill-posed problems for smoothing.

Here is a comparison between L1 and L2 regularizations.
关于归一化和标准化的解释