论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks

Bag of Tricks for Image Classification with Convolutional Neural Networks



论文总结出4中增加batch size,但是有好处的方法:
(1)Linear scaling learning rate
choose 0.1 as the initial learning rate for batch size 256, then when changing to a larger batch size b, we will increase the initial learning rate to 0.1 × b/256.
(2)Learning rate warmup
论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks

(3)Zero γ
(4)No bias decay:weight decay经常用在所有可训练参数中(包括权重和bias),这相当于将L2正则化应用到所有参数中,让它们的值趋近于0

现在的神经网络基本上都是使用32位浮点数(FP32)精度来训练的,所有的数字都是以FP32的格式存储的,算术计算的输入和输出都是FP32。作者经过实验得出使用FP16的话,训练速度也快了很多。BS指Batch size。
论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks

论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks

(1)Cosine Learning Rate Decay
论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks
Mixup Training混合训练:
在mixup中,每次随机采样两个实例〖(x〗_i, y_i)and(x_j, y_j),然后对选取的两个实例进行加权的线性插值来组成一个新的实例。
论文阅读:Bag of Tricks for Image Classification with Convolutional Neural Networks