Vgg-net学习笔记

ConvNet架构
1、input image is 224224 ，对每个像素减去平均值（训练集上）
2、small receptive region，33 filter size，多个small filter 的堆积，相当于一个大的感知域，但是使用小的感知域filter的堆积，能够引入更多的非线性，同时参数较大的感知域的filter来说更少，相当于做了正则化。stride为1。
3、5个池化层，参数为2*2，stride为2。
4、3个全连接层，全连接层的配置相同。最后一层为softmax层。
5、每一个隐藏层都需要非线性处理（Relu）
6、每经过一个池化层，filter个数增加一倍，起初为64，最后为512。
Vgg-net学习笔记

分类
1、训练
sgd with momentum ，learning rate动态调整，当验证集上分类的准确率不再提升，降低学习率10倍。batch=256
padding size，以维持空间大小不变。
多类别回归+L2 decay
前两个全连接层使用0.5的dropout。
初始化影响学习，因此利用浅层网络训练得到的参数初始化深层网络，其余层使用正态分布来初始化，以加快收敛速度。
为避免过拟合，增强训练集：水平flip和随机颜色变化
step1：rescale training image:
rescale image size：
single：256和384
multi：在[256,512]随机选定尺寸，被视为尺寸抖动，使得网络可识别出各个尺度的对象。
step2：crop from rescaled training image

3、测试
对test set做水平翻转
设定test image scale
与训练不同，为减小计算量（multi-crop image），这里采用dense evaluation：
未经剪裁的image，将全连接层变为卷积层（训练所得的参数如何变为filter的参数？？？）

不理解dense evaluation和multi crop padding的差异：
multi-crop evaluation is complementary to dense evaluation due to different convolution boundary conditions: when applying a ConvNet to a crop, the convolved feature maps are padded with zeros, while in the case of dense evaluation the padding for the same
crop naturally comes from the neighbouring parts of an image (due to both the convolutions and spatial pooling), which substantially increases the overall network receptive field, so more contextis captured.

实验结果表明，训练集、测试集上采用尺寸抖动，有利于准确率的提升。即便对于单一网络来说，训练集上应用尺寸抖动，使得网络具备了可识别多尺度的对象，称作multi-scale model。（This can also be seen as training set augmentation by scale jittering, where a single model is trained to recognise objects over a wide range of scales）

相关推荐