一、文献概况

CVPR2019的一篇文章，最近一直在看注意力模型，就看到了这篇文章。文章的第一单位是自动化所。

文章下载连接：http://openaccess.thecvf.com/content_CVPR_2019/papers/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.pdf

文章引用格式：J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu. "Dual Attention Network for Scene Segmentation." IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2019.

项目地址：https://github.com/junfu1115/DANet/

二、文献导读

网上对于这篇文章的解读比较少，推荐以下几个：

[1] (CVPR2019)图像语义分割(18) DANet-集成双路注意力机制的场景分割网络

看下文章的摘要部分：

In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale feature fusion, we propose a Dual Attention Network (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the feature at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-theart segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data.

对于背景信息的捕捉，传统方法一般都是采用多尺度特征融合的思路，但是本文则是基于自注意力机制（self-attention），提出了DANet网络结构。在这个结构中，FCN网络的顶部增加了两种attention模块——位置注意力模块（position attention module）和通道注意力模块（channel attention module），分别用来处理空间和通道的语义相互依赖性。

三、文献详细介绍

目前，FCN网络能够比较好的适用于场景分割问题，尤其是对于尺度，光照等不同场景的处理。处理思路一种是用多尺度背景融合（multi-scalecontext fusion），另一种是使用循环神经网络（recurrent neural networks）。当然这两种方法都有缺点，比如多尺度背景融合虽然能够捕捉不同尺度的对象，但是却不能处理全局视角下目标之间的关系；循环神经网络虽然能够捕捉全局视角下的目标关系，但是他的效率却依赖于长期记忆单元学习的输出。因此本文提出了DANet以解决这些问题。

作者在DANet网络种引入自注意力机制（self-attention）来分别捕捉空间维度和通道维度的特征，并在FCN的顶部添加了两个平行的注意力模块——位置注意力模块和通道注意力模块。对于位置注意力模块来说，作者引入了自注意力机制以捕捉feature map上任意两个位置之间的空间依赖性，对于通道注意力模块来说，也是引入了相似的自注意力机制，来捕捉任意两个通道之间的依赖性。

文章的主要贡献在于：

（1）We propose a novel Dual Attention Network (DANet) with self-attention mechanism to enhance the discriminant ability of feature representations for scene segmentation. （提出了引入自注意力机制的DANet）

（2）A position attention module is proposed to learn the spatial interdependencies of features and a channel attention module is designed to model channel interdependencies. It significantly improves the segmentation results by modeling rich contextual dependencies over local features. （为改善分割结果提出了位置注意力和通道注意力模块）

（3）We achieve new state-of-the-art results on three popular benchmarks including Cityscapes dataset, PASCAL Context dataset and COCO Stuff dataset. （在三种数据集上的表现良好）

下面是作者所采用的网络结构：

【文献阅读】用于场景分割的DANet（J. Fu等人，CVPR，2019）

作者对于resnet做了一点点改进，移去了下采样操作，在最后两层resblock中使用了膨胀卷积（dilated convolutions）。生成的feature map的大小为原图的1/8。然后将通过改进的resNet的结果分别输入到两个注意力模块中去，对于位置注意力模块来说，首先用卷积层获得feature，然后分三步来生成新的feature，包括①生成空间注意力矩阵（spatial attention matrix）②对注意力矩阵和原始feature做矩阵乘积③对乘积后的矩阵与原始feature做对应元素相加；对于通道注意力模块来说，操作和位置注意力模块的操作类似。最后再将两个模块的处理结果进行整合。

两个模块的具体操作示意图如下：

【文献阅读】用于场景分割的DANet（J. Fu等人，CVPR，2019）