论文笔记：Attention Correctness in Neural Image Captioning

Attention Correctness in Neural Image Captioning

这篇论文在attention上做足了文章，分别建立了一个有监督attention训练机制，和新的评价标准，Attention Correctness。

Supervised attention model

在attention机制中，是以论文笔记：Attention Correctness in Neural Image Captioning 来表示t时刻在视觉块a上的关注度，具体如下。

论文笔记：Attention Correctness in Neural Image Captioning

在这个参数的训练过程中，我们本可以有一些已经具有bound box标签的样本，来修正论文笔记：Attention Correctness in Neural Image Captioning ，使它变现的更好，但这之前一直没有人这么用。(通俗的来讲，就是中标注出来的方框位置ground truth和attention自动识别的区域是不是一样)

对于t时刻的词语论文笔记：Attention Correctness in Neural Image Captioning ，根据bounding box获得的权重参数为（它用来标识标注的真实区域），此时作者做了一个很有技巧的事，对于L个视觉区域，参数的和为

论文笔记：Attention Correctness in Neural Image Captioning

这样，论文笔记：Attention Correctness in Neural Image Captioning 和就可以看作是两个概率分布，交叉熵（衡量两个分布的相似性）便可以用来衡量这两个参数的相似度，也就是attention关注的视觉区域和实际标注区域的一致性。此外，作者将与图片没有alignment的词语所对应的交叉熵直接设为0（这里不是表示相似度为0，设为常数表示对这个词的相似度，也可以设其他常数）。

论文笔记：Attention Correctness in Neural Image Captioning

于是要优化的目标函数loss如下，其中前面那个部分是概率似然函数。

论文笔记：Attention Correctness in Neural Image Captioning

剩下的任务就是，如何从带有标注的图片里构建有效的论文笔记：Attention Correctness in Neural Image Captioning 了。

强监督

这里需要针对文本描述有专门的对齐的标注（Alignment Annotation），而这是十分难以获取的语料。这里首先针对224*224的图像，构建一个参数

论文笔记：Attention Correctness in Neural Image Captioning

构建的方法就是，将论文笔记：Attention Correctness in Neural Image Captioning 所对应的bounding box 对应的像素前去标识为1。

论文笔记：Attention Correctness in Neural Image Captioning

此后，再将这个区域进行resize,大小和论文笔记：Attention Correctness in Neural Image Captioning 相同，并进行归一化。也就是要的了。

1)弱监督

Ground truth alighment标签的获取是十分的昂贵的，这里可以使用目标分类的bounding box的标签来完成这个任务。对于图片中出现的所有object bounding box,首先将物体类别标签取出来，然后可以求出论文笔记：Attention Correctness in Neural Image Captioning 与标签的相似度（使用词向量）。具体如下：

论文笔记：Attention Correctness in Neural Image Captioning

2)Attention Correctness

这是为了衡量attention的论文笔记：Attention Correctness in Neural Image Captioning 参数好坏的标准。

论文笔记：Attention Correctness in Neural Image Captioning

如图，首先是将论文笔记：Attention Correctness in Neural Image Captioning 矩阵resize到训练图片的大小，然后归一化为。此后在对应的bounding box框框中的参数和作为最后的真实得分。

论文笔记：Attention Correctness in Neural Image Captioning

这个标准针对的是测试阶段，所以就出现一个问题，生成的句子并不是和ground truth sentence是一样的，这样的话就无法使用ground truth标注好的bounding box了。因此，设计了以下两种策略：

Ground Truth Caption:每次输入选用的不是上一个timestep生成的词，而是直接使用上一个timestep中ground truth caption的词，强制输出生成ground truth caption。（这样的话，这个测试过程只能单独用来测试Attention Correctness了）
Generated Caption：先进行词性标注，找到两个词语的生成句子和ground truth中重叠的名词短语进行评价。如“A dog jumping over a hurdle”与“A cat jumping over a hurdle”，重叠的名词短语就是“a hurdle”。

参考文献：
Liu C, Mao J, Sha F, et al. Attention Correctness in Neural Image Captioning[J]. 2016.

论文笔记：Attention Correctness in Neural Image Captioning

Attention Correctness in Neural Image Captioning

相关推荐