ScratchDet: Exploring to train single-shot object detectors from scratch

Abstract:为什么采用在大型数据集上预训练模型？

the domain gap between source and target datasets
the learning objective bias between classification and detection
the architecture limitations of the classification network for detection

提出了ScratchDet

we study the impact of BatchNorm on training detectors from scratch, and find that using BatchNorm on the backbone and detection head subnetworks makes the detector converge well from scratch.

1.Introduction

使用预训练模型在目标检测上有严重的限制。第一，fine-tuning can be regarded as a transfer learning problem,whilch is difficult to fill the domain gap perfectly between the source dataset and the target dataset.第二，其次，分类和检测任务对transition有不同程度的敏感性。分类任务优选于平移不变性，因此需要下采样操作（例如，最大池和步幅2的卷积）以获得更好的性能。相反，局部纹理信息对于对象检测更为关键，使得transition不变操作（例如，下采样操作）的使用谨慎。也不方便更改模型结构。

ResNet/VGGNet + SSD

BN reparameterizes the optimization problem to make its landscape significantly smoother instead of reducing the internal covariate shift. BN helps the detector converge well without adapting the pretrained model based detector.分析基于RsNet和VGGNet的SSD下第一个卷积上的采样步长对表现有比较大的影像。引入了root block

2.Related work

Object detectors with pretrained network.

Train-from-scratch object detectors

BN(有了bn，学习率可以大一点，加速模型训练)

3.ScratchNet

BactNorm for SSD Trained from sratch

DSOD 使用Densenet,并没有发现BN的重要作用。

BN on the backbone subnetwork

We add BN on each conv layer in the backbonesubnetwork and then train it form scratch.我们可以使用相对较大的学习率，0.01或者0.05来进一步提高表现72.5%-77.8% and 78%，和预训练77.2%，进一步表明添加BN在backbone subnetwork是很重要的一个措施去提高SSD from scratch.

BN on the detection head subnetwork

These results are very useful to explain the phenomenon that using large learning rate to train SSD with the original architecture from scratch or pretrained networks usually leads to gradient explosion, poor stability and weak prediction of gradients.

BN in the whole network

在两个部分上都使用了BN，配一个比较大lr，相比较预训练SSD，from scratch模型精度提高了。

ScratchDet: Exploring to train single-shot object detectors from scratch

大的lr配BN

Backbone Network redesign

Perdormance analysis of ResNet and VGGNet

SSD中ResNet-101比VGGNet16效果要好，在DSSD中，VGGNet16要比ResNet-101效果好。

We argue that this phenomenon is attributed to the downsampling operation in the first convolution layer (i.e.,conv1 x with stride 2) of ResNet-101, which cuts off half of the raw image information. This operation significantly affects the detection accuracy, especially for small objects

尤其对于小物体的detcetion，上来缩小一半，信息损失太严重，所以如果将图像放大，就可以消除这个缺点，512*512，所以SSD上resnet表现好一点，

In summary, the downsampling operation in the first convolution layer has a bad impact on the detection accuracy, especially for small objects.

Backbone network redesign for object detection

Root-ResNet,

we remove the downsampling operation (i.e., change the stride from 2 to 1) in the first conv layer and replace the 7 × 7 convolution kernel by a stack of several 3 × 3 convolution filters (denoted as the root block). With these improvements, Root-ResNet is able to exploit more local information from the image, so as to extract powerful features for small object detection.

ScratchDet: Exploring to train single-shot object detectors from scratch

Furthermore, we replace four convolution blocks (added by SSD to extract the feature maps with different scales) with four residual blocks to the end of the Root-ResNet. Each residual block is formed by two branches. One branch is a 1 × 1 convolution layer with stride 2 and the other one consists of a 3×3 convolution layer with stride 2 and a 3×3 convolution layer with stride 1. The number of output channels in each convolution layer is set to 128

ScratchDet: Exploring to train single-shot object detectors from scratch

Input size越大，map越高

ScratchDet: Exploring to train single-shot object detectors from scratch

使用BN，使用一个大的lr，map越高。不用BN，大的lr就不收敛了。

ScratchDet: Exploring to train single-shot object detectors from scratch

ScratchDet: Exploring to train single-shot object detectors from scratch

相关推荐