人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
ICCV2017

针对人群密度估计问题,本文主要从 incorporating global and local contextual information 来降低人群密度估计误差
使用多个CNN网络来估计不同尺度的 context 来帮助人群密度估计
The proposed method uses CNN networks to estimate context at various levels for achieving lower count error and better quality density maps

和文献【50】的对比
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

当前方法在人群低密度和高密度时,估计的误差都比较大
A potential solution is to use contextual information during the learning process.
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

2 Related work
Regression-based approaches. 为了克服遮挡和背景运动产生的干扰,这类方法使用回归方法,学习一个映射,learn a mapping between features extracted from local image patches to their counts,这类方法包含两个模块: low-level feature extraction and regression modeling

Density estimation-based approaches 回归方法虽然解决了 occlusion and clutter 的问题,但是他们忽视了重要的空间信息,只给出了一个人群总人数。密度估计方法主要学习 local patch features and density maps 之间的映射关系

CNN-based methods 尝试各种 CNN网络来完成 人群总数估计和人群密度图生成。

对于以前各种方法分析,我们认为有以下几点问题:
1)这些方法都没有显示的嵌入 context 信息,而 context 信息对提升性能很有帮助
2)当前基于回归的密度图估计方法更侧重降低人群总数估计误差,而不是侧重人群密度图的质量
3)当前的 CNN 网络基本都是使用 像素级欧式损失函数来训练网络,这导致密度图比较模糊。

3 Proposed method (CP-CNN)
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
GCE and LCE 分别提取图像的 global and local context 信息
DME is a multi-column CNN that performs the initial task of transforming the input image to high-dimensional feature maps
F-CNN 综合GCE 、LCE 、 DME的结果 produce high-resolution and high-quality density maps

3.1. Global Context Estimator (GCE)
这里我们是如何表示 global context 的信息了? 我们将 global context 和图像的密度等级联系起来,这里我们将图像人群密度等级分为五类:extremely low-density (ex-lo), low-density (lo), medium-density (med), high-density (hi) and extremely high-density (ex-hi)
当然具体分多少类 这个和数据库密度变化范围有关,但是我们发现仅适用五类就可以明显提升密度图估计效果

这里我们使用一个 CNN网络 将输入图像进行分类,根据人群密度分为5类, a VGG-16 [31] based network is fine-tuned with the crowd training data
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
VGG-16 所有的卷积层被保留不变,后面的三个全连接层被替换为不同配置的全连接层,为了完成5分类。后面两个卷积层参数被微调,其他卷积层参数固定不变。

3.2. Local Context Estimator (LCE)
当前的人群密度估计方法更侧重于降低人群总数估计的误差,所以它们的人群密度图质量相对降低,我们相信 some kind of local contextual information 能够帮助我们提升密度图质量。和 GCE 思路类似,这里我们使用一个 CNN网络 将图像根据其人群密度分为5类, {ex-lo, lo,med, hi, ex-hi}
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

3.3. Density Map Estimator (DME)
DME 主要讲输入图像映射到一组 high-dimensional feature maps,这里我们受文献【50】的启发,采用 multi-column architecture
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
虽然在这里我们可以通过增加 the filter sizes and number of columns 来解决人群密度变化范围大的问题,但是这么做一方面很难适用于不同数据库,另一个方面计算量较大

3.4. Fusion-CNN (F-CNN)
这里我们将前面学习到的3类特征组合起来。
F-CNN is constructed using a set of convolutional and fractionally-strided convolutional layers. The set of fractionally-strided convolutional layers help us to restore details in the output density maps. The following structure is used for F-CNN: CR(64,9)-CR(32,7)- TR(32)-CR(16,5)-TR(16)-C(1,1)
C is convolutional layer, R is ReLU layer, T is fractionally-strided convolution layer

这里我们参考了 GANs, 将 adversarial loss 引入进来。 improve the quality of density maps by minimizing a weighted combination of pixel-wise Euclidean loss and adversarial loss.
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

5 Experimental results
ShanghaiTech Part A
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

UCF CC 50 dataset
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs

WorldExpo’10 dataset
人群密度估计--Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs