论文阅读笔记:Learning Deep Features for Discriminative Localization
Introduction
Task
基于弱监督学习的图像分类和定位(检测)
相关工作:
- 弱监督目标定位
- 可视化CNN
Method
Class Activation Mapping(CAM)
CAM技术详细且简洁地展示了如何用CNN进行目标定位(检测)以及可视化,原理很简单,主要基于global average pooling(GAP)
-
Firstly, get the last convolutional layer feature maps ,is the channel feature map, channel num is
-
Sencondly,use global average pooling to get
-
Thirdly,use a FC layer,get class score ,it can be used to compute softmax cross entropy loss and then to train
-
Finally,we can get class activation map by the weights for every class ,the resolution of and is same, and we can upsample it to get final map(size is same with oral image)
Experiments
classification result
Compared with original network(VGG、GoogleLeNet et al),use GAP there is a small drop of 1%-2%.
Localization
Compared with fully-supervised methods,use CAM there is a large difference,at last this method not use bounding box.
Conclusion
- It is important that we can use classification-trained CNNs to learn to localization,without using any bounding box.
- The class activation mapping method is easy to transfer to other task for example captioning、VAQ et al.