论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

CVPR2018:https://arxiv.org/abs/1711.07613

文章讨论的是视觉对话,目标是实现更Human-like的回复。举例:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

要实现这样的目标,文章摒弃了先前使用简单 MLE(最大似然估计)作为目标函数预测回复的方法,这个方法常用于机器翻译和VQA。这种简单的训练方法会导致安全的但一般、重复的回复。

文章使用GAN和RL结合,训练两个子模块:一个生成器根据图片和对话历史生成回复,一个判别器判别人的回复和机器的回复,判别器的输出作为一个reward。

看主要框架:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》


第一部分 sequential co-attention generator 连续的协同注意力生成器

生成回复依然是encoder-decoder架构,与单纯的把图片、历史、问题分别编码然后连接不同,文章专注于特定区域和片段。首先CNN提取图片特征V,LSTM提取问题特征Q,历史特征U,然后用协同注意力机制生成权重。co-attention encoder如下:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

输入特征序列V、U、Q,输出对应注意力特征v、u、q,计算公式为等式1-3。x表示这三种特征的任一种,g1、g2表示,Wx,

Wg1 ,Wg2是可学习的参数,h是注意力模块的隐层数,M是特征序列的长度。最后的总特征表达为论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

整个生成过程表示为论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》


第二部分 discriminative model with attention memories 有注意力记忆的判别模型

判别器判别生成的回复是人还是机器,用二分类器softmax,判别器输入是v、u、Q、A,其中Q-A经过LSTM变成uQA向量,与v、u经过全连接嵌入在一起,然后进行分类。

被识别为人的概率是论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》


第三部分 Adversarial REINFORCE with an intermediate reward 有中间reward的对抗的REINFORCE算法

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》被当作一种reward,用REINFORCE算法最大化:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

用似然率简化等式7:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

p是生成的word的概率,ak是回复的第k个词,b是基准值。

中间步骤reward:上述的reward只考虑最终的生成序列,所有相关的action都用这个reward,文章提出中间过程的reward。比如:‘Are they adults or babies?’,人的回复:‘I would say they are adults’机器的回复:‘I can’t tell’.上述的REINFORCE模型会给出一个低分的reward,文章认为,应该给每个token分别reward,即‘I’是高分, can’t 和tell是低分。

不过判别器是衡量整个序列的,不是中间某个过程,文章提出用Monte Carlo (MC) search 和  roll-out (generator) policy 采样tokens,

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

其中论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》是根据生成策略和当前状态采样。从当前状态到序列结束,执行N次策略,生成的序列进到判别器,其平均分数被用作产生token ak的动作的奖励。

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

有了这个中间奖励,文章的梯度计算如下:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

文章使用teacher forcing strategy更新生成器,总的算法流程:

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

实验部分

数据集VisDial

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》

CoAtt-G-MLE:没有对抗学习,用MLE目标函数

CoAtt-GAN-w/o Rinte:有对抗学习,只用全局reward计算梯度

CoAtt-GAN-w/ Rinte:用中间reward

CoAtt-GAN-w/ Rinte-TF:有‘teacher forcing’

论文笔记《Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning》