Learning Deep Features for Discriminative Localization

NIN[2]提出的全局平均池化(Global Average Pooling, GAP),代替如AlexNet和VGG中出现的含大量参数的的全连接层(FC6、FC7),能起到防止过拟合、提高泛化能力的作用。而且相对于FC层, GAP对空间平移更鲁棒, 可解释性也更强。除了上述优点, 本文提出另一观点,GAP能有效地定位感兴趣的物体位置。

Class Activation Mapping(CAM)生成

Learning Deep Features for Discriminative Localization
fk(x,y)是最后一层卷积层第k个通道位于(x,y)处的**值, Fk=x,yfk(x,y)是GAP后第k个通道的结果。则对于给定类c,Softmax的输入Sc=kwckFk,其中wck对应类c及通道k的权重系数。综上,有

Sc=kwckx,yfk(x,y)
=x,ykwckfk(x,y)

Mc(x,y)=kwckfk(x,y), 则Mc(x,y)反应了(x,y)的**值对于类c的重要程度。这样就得到了一个(7 * 7)的Class Activation Mapping,再将CAM放大到输入图片的大小与原图片叠加,就能获得最后的演示效果,如下图所示:
Learning Deep Features for Discriminative Localization
具体代码参考链接https://github.com/nicklhy/CAM/blob/master/cam.ipynb,不过其中一些mxnet用法已经过时了,需要自己修改。

实验结果

分类

Learning Deep Features for Discriminative Localization

定位

Learning Deep Features for Discriminative Localization
Learning Deep Features for Discriminative Localization

细粒度识别

Learning Deep Features for Discriminative Localization
Learning Deep Features for Discriminative Localization

参考文献

未完待续