【CV基石】Softmax and CrossEntropy

Softmax

Softmax 函数接收一个这N维向量(或者MxN维的数组,M代表样本数,N代表类别数)作为输入,然后把每一维的值转换成(0,1)之间的一个实数,公式如下:
pi=eaik=1Neak p_{i}=\frac{e^{a_{i}}}{\sum_{k=1}^{N} e^{a_{k}}}
为保持数值稳定,避免出现nan情况,一般对输入向量做归一化处理,数值稳定的Softmax 公式如下:
pi=eaimax(a)k=1Neakmax(a) p_{i}=\frac{e^{a_{i}-max(a)}}{\sum_{k=1}^{N} e^{a_{k}-max(a)}}
Softmax 函数的导数如下所示:
pjaj={pi(1pj) if i=jpjpi if i≠j \frac{\partial p_{j}}{\partial a_{j}}=\left\{\begin{array}{ll}{p_{i}\left(1-p_{j}\right)} & {\text { if } i=j} \\ {-p_{j} \cdot p_{i}} & {\text { if } i =\not j}\end{array}\right.

CrossEntropy

CrossEntropy(交叉熵)通常作为Softmax分类的损失函数,即通常所说的交叉熵损失函数。交叉熵损失函数体现了模型输出的概率分布和真实样本的概率分布的相似程度。其定义公式如下,其中yiy_{i}代表One-hot 编码的标签:
L=H(y,p)=iyilog(pi) L=H(y, p)=-\sum_{i} y_{i} \log \left(p_{i}\right)
交叉熵损失函数的导数如下所示:
Loi=piyi \frac{\partial L}{\partial o_{i}}=p_{i}-y_{i}


Python实现代码如下:

# -*- coding: UTF-8 -*-
import numpy as np 

def softmax(X):
    """Compute the softmax of output from classification layer.
    Parameters
    ----------
    X: list.
        A array of M x N. M is the number of samples and N is the number of categories.
    Returns
    -------
    rst: list.
        The result of softmax.
    """
    exps = np.exp(X-np.max(X, axis = 1).reshape(-1, 1))
    rst = exps/np.sum(exps,axis=1).reshape(-1,1)
    return rst

def cross_entropy(X,y):
    """Compute the cross entropy loss and grad of output from softmax.
    Parameters
    ----------
    X: list.
        A array of M x N. M is the number of samples and N is the number of categories.
    y: list.
        A array of M x 1. The value is the label of GT.
    Returns
    -------
    loss: float.
        The loss of prediction and label.
    grad: list.
        The gradient produced by each element of the predicted value.
    """
    m = len(y)
    p = softmax(X)
    log_likelihood = -np.log(p[range(m), y])
    loss = np.sum(log_likelihood) / m

    grad = p
    grad[range(m), y] -= 1
    grad = grad / m
    return loss, grad

def main():
    X = [[0.1, 1.5, -0.3, 2.2, 0.7],
         [1.0, -2.3, 5.2, -0.1, 2.9],
         [-3.5, -1.1, 3.7, 0.2, 2.6]]
    y = [3,4,2]
    rst = softmax(X)
    print('softmax rst:\n',rst)
    print('softmax check:\n',rst.sum(axis=1).reshape(-1,1))
    loss, grad = cross_entropy(X,y)
    print('loss:',loss)
    print('grad:',grad)

if __name__ == "__main__":
    main()

参考

Softmax和交叉熵的深度解析和Python实现


Technical Exchange

【CV基石】Softmax and CrossEntropy