论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Motivation &Novety

在多轮对话生成中，上下文语义对模型下一步的对话生成产生很重要，目前在多回合对话生成中广泛采用的是分层递归模型等。他们不加区别的对待所有上下文会损害模型的性能。
本文提出一个新的模型ReCoSa，来解决多轮对话中的生成问题。
本文引入了自注意力机制来解决相关上下文语义权重的问题。

Background Knowledge

From Seq2Seq to Encoder-decoder

HRED Hierarchical Neural Network Models

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Examples

作者在本文中想要解决的问题
论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
作者在本文中提出的模型

Encoder

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Self-Attention

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
这里是注意力机制（attention）的部分，使用的自注意力机制（self-attention）能够捕获长距离的关系，并且可以并发计算。

Context Self-Attenion

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
作者使用长距离的自注意力机制，可以捕获不同context中的word之间的关系，从而实现捕获远距离的关系。

作者在使用attention机制的时候，使用了多头注意力机制的融合，具体公式如上图所示。在得到上下文的attention的representation的时候，右上角是没有角标的，然后会经过一层线性层之后才会有角标。

Response Representation Encoder

给定Response Y
论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
对于每个词yt 它的词向量为：

这里第一个加号不是写错了，其实就是和位置向量的融合，在计算的时候，concat和直接向量相加是一样的，具体数学原理可以再查查。

Context-Response Attention Decoder

在这一步解码中，分配了不同的QKV向量。公式就是基本的Seq2Seq解码过程。
论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Experiments

实验采用了中文和英文的数据集，进行评估。PPL分数越低越好，BLEU分数越高越好。
论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
最后本文属于对话生成方面的一个工作，具体代码在github上也有。
以上内容是本人在组会中汇报的内容，如果有不正确的地方，欢迎指正！

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

论文学习-ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation

Motivation &Novety

Background Knowledge

From Seq2Seq to Encoder-decoder

HRED Hierarchical Neural Network Models

Examples

Encoder

Self-Attention

Context Self-Attenion

Response Representation Encoder

Context-Response Attention Decoder

Experiments

相关推荐