人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

针对复杂场景的拥挤场景理解我们提出了一个 CSRNet 网络,该网络主要包括两个部分,前端使用一个 卷积网络用于 2D 特征提取,后端用一个 dilated CNN。 该网络在几个常用的公开人群密度估计数据库上取得了不错的效果。

1 Introduction
拥挤场景解析的发展从简单的人群数值估计到 人群密度图估计,人群密度图可以提供额外的信息,因为同样数量的人可以分布在不同的位置,如下图所示
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes


以前基于CNN网络的人群密度估计主要采用了 multi-scale architectures,虽然取得了不错的性能,但是存在两个问题:当网络变深的时候, the large amount of training time and the non-effective branch structure ,这里我们设计了一个实验验证了 multi-column CNN (MCNN) 表现的效果没有 比 没采用 multi-column 的要好。
这里我们设计了一个 a deeper, regular network with the similar amount of parameters

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

设计 multi-column CNN (MCNN) 的本意是希望可以学习不同尺寸感受野的特征,但是下图显示三个column 学习到的特征相似,没有达到设计之初的目的。

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计从方法上来说可以分为三大类:detection-based methods, regression-based methods, and density estimation-based methods

其中基于CNN的密度图估计多数采用了 multi-column based architecture (MCNN) 的架构,我们观察到这种结构存在几个问题:
1)Multi-column CNNs 比较难训练
2)Multi-column CNNs 引入了冗余的网络结构,如表1 所示
3) 需要 density level classifier ,这样计算量比较大
4) 这些网络用了很大一部分参数用于 density level classification,用于密度图估计的参数占小部分

3 Proposed Solution
The fundamental idea of the proposed design is to deploy a deeper CNN for capturing high-level features with larger receptive fields and generating high-quality density maps without brutally expanding network complexity.

3.1. CSRNet architecture
网络的前端我们采用 VGG-16 卷积层部分,在后端我们采用 dilated convolutional layers

3.1.1 Dilated convolution
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

3.1.2 Network Configuration
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

3.2. Training method
3.2.1 Ground truth generation
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

3.2.2 Data augmentation

3.2.3 Training details
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

4 Experiments
4.1. Evaluation metrics
人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

人群密度估计--CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes