目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

BlitzNet: A Real-Time Deep Network for Scene Understanding
ICCV2017
Project: http://thoth.inrialpes.fr/research/blitznet/
Code: https://github.com/dvornikita/blitznet

本文在 SSD 基础上改进提出 BlitzNet，使其可以同时检测和分割，速度达到实时。使用 ResNet-50
Mas R-CNN 是在 Faster R-CNN基础上改进得到，同时检测分割，效果稍好，速度稍慢使用 ResNet-101

BlitzNet 的输出形式如下图：
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

BlitzNet architecture
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

3.1. Global View of the Pipeline
输入图像经过 ResNet-50 得到特征图，接着特征图经过一系列降采样，再经过一系列上采样，用这些上采样不同尺度的特征图进行检测和分割。

3.2. SSD and Downscale Stream
SSD 将输入图像的特征图分成若干个网格，在这些网格上使用 anchor boxes 进行类似模板匹配，使用CNN进行分类和坐标回归。原文使用 VGG-16 来提取特征图，在对该特征图进行一系列池化和卷积得到多尺度特征图，在这些多尺度特征图上分别进行目标检测，实现多尺度检测。最后进行非极大值抑制得到最终检测结果。

3.3. Deconvolution Layers and ResSkip Blocks
对于复杂场景的解析，对 visual context 建模是很重要的，在卷积网络中对应池化层，这可以增加每个神经元的感受野。对于语义分割来说，precise localization 也是很重要的，文献【20】使用反卷积操作来解决这个问题。文献【19】通过加入skip connections来进一步改善。加入skip connections 不仅可以实现低层次特征和高层次特征的融合，还可以使网络更容易训练【9】。

这里我们设计了一个 ResSkip 来实现 skip connections
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

3.4. Multiscale Detection and Segmentation
在我们的网络中大部分权值是共享的，对多尺度特征图使用一个 single convolutional layer 来实现多尺度目标检测。对于分割，我们将多尺度特征图归一化尺寸，然后使用一个 single convolutional layer 进行分割

3.5. Speeding up Non-Maximum Suppression
这里我们为了提高速度，对 Non-Maximum Suppression 这一步进行了加速

4 Experiments

Pascal VOC2007 test set
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

Pascal VOC 2012 test set
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

速度
目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

目标检测分割--BlitzNet: A Real-Time Deep Network for Scene Understanding

相关推荐