READING NOTE: Learning Spatial Regularization with Image-level Supervisions for Multi-label ...
TITLE: Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification
AUTHOR: Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
ASSOCIATION: University of Science and Technology of China, University of Sydney, The Chinese University of *
FROM: arXiv:1702.05891
CONTRIBUTIONS
- An end-to-end deep neural network for multi-label image classification is proposed, which exploits both semantic and spatial relations of labels by training learnable convolutions on the attention maps of labels. Such relations are learned with only image-level supervisions. Investigation and visualization of learned models demonstrate that our model can effectively capture semantic and spatial relations of labels.
- The proposed algorithm has great generalization capability and works well on data with different types of labels.
METHOD
The proposed Spatial Regularization Net (SRN) takes visual features from the main net as inputs and learns to regularize spatial relations between labels. Such relations are exploited based on the learned attention maps for the multiple labels. Label confidences from both main net and SRN are aggregated to generate final classification confidences. The whole network is a unified framework and is trained in an end-to-end manner.
The scheme of SRN is illustrated in the following figure.
To train the network,
- Finetune only the main net on the target dataset. Both
fcnn andfcls are learned with cross-entropy loss for classification. - Fix
fcnn andfcls . Trainfatt andconv1 with cross-entropy loss for classification. - Train
fsr with cross-entropy loss for classification by fixing all other sub-networks. - The whole network is jointly finetuned with joint loss.
The main network follows the structure of ResNet-101. And it is finetuned on the target dataset. The output of Attention Map and Confidence Map has