【Distill 系列:二】CVPR 2019 Distilling Object Detectors with Fine-grained Feature Imitation
cvpr 2019
detectors care more about local near object regions
motivation
-
Fitnets: Hints for thin deep nets
hint learning 利用网络中间层 teacher & student feature map L2 loss
分析 feature map 整体做 hint 不好,detectors care more about local regions that overlap with ground truth objects while classification models pay more attention to global context.
本文并不是对整个feature map 做的 hint learning,而是只对 gt附近的anchor 进行蒸馏,有一个 fine-grained feature limitation
红框:和 gt box IOU 最大
绿框:与 gt box IOU 大于 下限阈值
method
- 对于每个 gt box,我们计算它与所有锚点之间的IOU,从而形成 W×H×K IOU map m
(W和H表示 feature map 的宽度和高度,K表示K个预设锚框) - 然后我们找到最大的IOU值 M = max(m),乘以阈值因子ψ(超参数,default ψ=0.5),以获得滤波器阈值F =ψ∗M
- 利用 F 对 IOU map 进行过滤,OR操作 保留大于F的位置 得到 W×H mask
- 循环遍历所有 gt box 并得到 mask,得到 fine-grained imitation mask I
a full convolution adaptation layer
(after corresponding student model before calculating distance metric between student and teacher’s feature response)
- teacher 和 student channel 数量不一定对齐
- 直接强迫 student 近似 teacher 的 feature 收益小
loss
we train student model to minimize the following objective
- as student model’s guided feature map
- as corresponding teacher’s feature map
- For each near object anchor location on the feature map of width and height
Together with all estimated near anchor location(the imitation mask ), the distillation objective is to minimize:
- I is the imitation mask
- Np is the number of positive points in the mask
- fadap(·) is the adaptation function.
Then the overall training loss of a student model is:
result
on faster rcnn