CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

论文链接地址https://www.researchgate.net/publication/332267080_Distilled_Person_Re-Identification_Towards_a_More_Scalable_System

1. 简介：

从三个方面考虑行人重识别问题的泛化能力：

降低标签代价–减少标签数
现有的有监督Re-ID方法需要大量的有标签的人物，而一个泛化力好的Re-ID系统能够从未标记的数据和有限的标记数据中学习。
降低扩展代价–重新利用已有的知识
当扩展到新的场景时，现有的Re-ID方法应用迁移学习，需要辅助的源域数据用于预先训练或者是联合学习。预训练的模型可能并不适用在不同的uer-specified需求。
降低测试计算代价–使用轻量级模型
现有的Re-ID方法基于大规模的神经网络，比如ResNet-50。
提出了Muti-teacher Adaptive Similarity Distillation Framework，只需要在目标域中提供少量的带标签的人物，就能从mutiple teacher model多老师模型迁移知识到轻量级的student model学生模型，没有从源域中获取数据。提出了Log-Euclidean Similarity Distillation Loss，更进一步的融合了Adapative Knowledge Aggregator自适应知识整合器去选择有效的老师模型迁移目标域自适应的知识。

2. 创新点：

(1) 提出了Log-Euclidean Similarity Distillation Loss用于Re-ID的知识蒸馏。
(2) 提出了Adapative Knowledge Aggregator能够从多老师模型中整合有效的信息到轻量级的学生模型。
(3) 将上述两种方法融合到Muti-teacher Adaptive Similarity Distillation Framework，可以同时降低标签代价，扩展代价，测试计算代价。
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

3. Similarity Knowledge Distillation

在许多知识蒸馏方法中，整合了软标签，但是对于Re-ID问题而言并不合适，因为Re-ID是一个开集合的身份识别问题，即在训练与测试集中没有重复身份的人。
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

3.1. Construction of Similarity Matrices

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

Properties of Student Similarity Matrix.

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

3.2. Log-Euclidean Similarity Distillation

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

4. Learning to learn from Multiple Teachers

4.1. Multiteacher Adaptive Aggregated Distillation

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

4.2. Adaptive Knowledge Aggregation

Validation Empirical(经验) Risk

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

Adaptive Knowledge Aggregator

5.实验

使用Market-1501 和 DukeMTMC 作为目标域数据。用带标签的数据集即MSMT17, CUHK03 ,ViPER , DukeMTMC和 Market-1501，训练5个老师模型T_1~T_5。
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

对于老师模型，采用PCB模型。对于学生模型，采用轻量级的模型MobileNetV2，并采用一个卷积层用于减少最后特征图的通道数为256，在ImageNet上预训练，并没有在任何的Re-ID数据集上训练。输入的图像调整为384*128，最后一个卷积层提取的特征图作为特征向量。每个batch，为了计算validation empirical risk，在标签数据中对每个身份采样两张图像以确保含有正样本对。
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

消融实验
Evaluation of Knowledge Distillation.

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记
Hinton et al比其它方法都低的原因在于它是通过软标签为闭集合分类问题设计。不适合开集合的Re-ID问题。PKT用概率分布蒸馏知识，对于Re-ID问题并没有本方法有效。
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记
Effect of the Learned Teacher Weights α_i
Individual Teacher.
对于Market-1501，老师T1 (MSMT17), T2 (CUHK03) 和 T4 (DukeMTMC)都是有效的。
对于DukeMTMC，只有老师T1 (MSMT17)是有效的。因为DukeMTMC更具挑战性，与Market-1501相比有更多的摄像机视角。
对于两个数据集，T3(ViPER)是最差的因为它的训练集是最小的，提供的信息较弱。

- w/ and w/o Learningα_i
通过Adaptive Knowledge Aggregator学习的老师权重可以显示老师模型的有效性。对于最差的老师T3，权重接近于0。在Market-1501中，Ours(semi)与Ours(unsupervised)，老师权重的选择只有有限的替身。在DukeMTMC中，这种提升更加明显，因为对于Market-1501，距离融合使用同样的权重比单独的老师好，但是在DukeMTMC中没有那么有效。距离融合会导致测试计算量增加，不如本方法的泛化力。

Comparison with Ensemble and Task Weighting.
本方法比ensemble, joint training 和 task weighting优越。测试计算代价比ensemble方法低，训练时间比joint training短。

The Number of Validation IDs.
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记
验证身份数量从0到50。5-50性能较好。比较使用0 ID和1 ID，在DukeMTMC上性能下降明显，显示了validation empirical risk的重要性。

Different Student Model Architectures.
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

Finetuning with Our Method as Initialization.
CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记
当给定更多的带标签数据，MobileNetV2学生模型可以用于fintuing的初始化。可以看到本方法在20% IDs上的fintuing结果与在ImageNet上使用100% IDs结果相当。

CVPR2019--Distilled Person Re-identification: Towards a More Scalable System阅读笔记

1. 简介：

2. 创新点：

3. Similarity Knowledge Distillation

3.1. Construction of Similarity Matrices

Properties of Student Similarity Matrix.

3.2. Log-Euclidean Similarity Distillation

4. Learning to learn from Multiple Teachers

4.1. Multiteacher Adaptive Aggregated Distillation

4.2. Adaptive Knowledge Aggregation

Validation Empirical(经验) Risk

5.实验

相关推荐