motivation

Fitnets: Hints for thin deep nets

hint learning 利用网络中间层 teacher & student feature map L2 loss
分析 feature map 整体做 hint 不好，detectors care more about local regions that overlap with ground truth objects while classification models pay more attention to global context.

本文并不是对整个feature map 做的 hint learning，而是只对 gt附近的anchor 进行蒸馏，有一个 fine-grained feature limitation

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

红框：和 gt box IOU 最大
绿框：与 gt box IOU 大于下限阈值

method

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

对于每个 gt box，我们计算它与所有锚点之间的IOU，从而形成 W×H×K IOU map m
(W和H表示 feature map 的宽度和高度，K表示K个预设锚框)
然后我们找到最大的IOU值 M = max（m），乘以阈值因子ψ（超参数，default ψ=0.5），以获得滤波器阈值F =ψ∗M
利用 F 对 IOU map 进行过滤，OR操作保留大于F的位置得到 W×H mask
循环遍历所有 gt box 并得到 mask，得到 fine-grained imitation mask I

a full convolution adaptation layer
（after corresponding student model before calculating distance metric between student and teacher’s feature response）

teacher 和 student channel 数量不一定对齐
直接强迫 student 近似 teacher 的 feature 收益小

loss

we train student model to minimize the following objective
【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

$s$ as student model’s guided feature map
$t$ as corresponding teacher’s feature map
For each near object anchor location $(i, j)$ on the feature map of width $W$ and height $H$

Together with all estimated near anchor location(the imitation mask $I$ ), the distillation objective is to minimize:

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

I is the imitation mask
Np is the number of positive points in the mask
fadap(·) is the adaptation function.

Then the overall training loss of a student model is:

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

result

on faster rcnn

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

【Distill 系列：二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation

motivation

method

loss

result

相关推荐