paper reading:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

通过联合学习对齐和翻译的神经机器翻译

作者:

Dzmitry Bahdanau
Jacobs University Bremen, Germany
KyungHyun Cho Yoshua Bengio∗
Universite de Montr ´ eal

摘要:

神经机器翻译的目标是建立一个单一的神经网络,可以共同调整以最大化翻译性能。最近提出的用于神经机器翻译的模型经常属于编码器 - 译码器族,并且将源句子编码成固定长度的矢量,解码器从该矢量生成翻译。

在本文中,我们推测使用固定长度向量是提高这种基本编码器 - 解码器架构性能的瓶颈,并且建议通过允许模型自动(软)搜索零件来扩展它的源句子与预测目标词相关,而不必将这些部分明确地形成为硬分段。

采用这种新方法,我们实现了与现有最​​先进的基于短语的系统相媲美的英文到法文翻译的翻译性能。

此外,定性分析显示模型发现的(软)对齐与我们的直觉非常吻合。

相关知识:

soft-alignment and hard-alignment(硬对齐与软对齐)

The strength of the soft-alignment, opposed to a hard-alignment, is evident, for instance, from Fig. 3 (d). Consider the source phrase [the man] which was translated into [l’ homme]. Any hard alignment will map [the] to [l’] and [man] to [homme]. This is not helpful for translation, as one must consider the word following [the] to determine whether it should be translated into [le], [la], [les] or [l’]. Our soft-alignment solves this issue naturally by letting the model look at both [the] and [man], and in this example, we see that the model was able to correctly translate [the] into [l’]. We observe similar behaviors in all the presented cases in Fig. 3. An additional benefit of the soft alignment is that it naturally deals with source and target phrases of different lengths, without requiring a counter-intuitive way of mapping some words to or from nowhere ([NULL]) (see, e.g., Chapters 4
and 5 of Koehn, 2010).

结果分析:

paper reading:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

      we see that the performance of RNNencdec dramatically drops as the length of the sentences increases. On the other hand,both RNNsearch-30 and RNNsearch-50 are more robust to the length of the sentences. RNNsearch-50, especially, shows no performance deterioration even with sentences of length 50 or more. This superiority of the proposed model over the basic encoder–decoder is further confirmed by the fact that the RNNsearch-30 even outperforms RNNencdec-50 (see Table 1).

paper reading:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

     we list the translation performances measured in BLEU score. It is clear from the table that in all the cases, the proposed RNNsearch outperforms the conventional RNNencdec. More importantly, the performance of the RNNsearch is as high as that of the conventional phrase-based translation system (Moses), when only the sentences consisting of known words are considered. This is a significant achievement, considering that Moses uses a separate monolingual corpus (418M words) in addition to the parallel corpora we used to train the RNNsearch and RNNencdec(Moses' corpus ).

对齐方法:

Handwriting synthesis is a task where the model is asked to generate handwriting of a given sequence of characters. In his work, he used a mixture of Gaussian kernels to compute the weights of the annotations, where the location, width and mixture coefficient of each kernel was predicted from an alignment model. More specifically, his alignment was restricted to predict the location such that the location increases monotonically.[Graves (2013)]

Our approach, on the other hand, requires computing the annotation weight of every word in the source sentence for each word in the translation. This drawback is not severe with the task of translation in which most of input and output sentences are only 15–40 words. However, this may limit the applicability of the proposed scheme to other tasks(缺陷).

中心过程:第三章全文

数据集:第四章

而在 NMT 的翻译模型中经典的做法是由编码器 - 解码器架构制定(Encoder-Decoder),用作 Encoder 和 Decoder 常用的是循环神经网络。这类模型大概过程是首先将源句子的输入序列送入到编码器中,提取最后隐藏状态的表示并用于解码器的输入,然后一个接一个地生成目标单词,这个过程广义上可以理解为不断地将前一个时刻 t-1 的输出作为后一个时刻 t 的输入,循环解码,直到输出停止符为止。通过这种方式,NMT 解决了传统的基于短语的方法中的局部翻译问题:它可以捕获语言中的长距离依赖性,并提供更流畅的翻译。但是这样做也存在很多缺点,譬如,RNN 是健忘的,这意味着前面的信息在经过多个时间步骤传播后会被逐渐消弱乃至消失。其次,在解码期间没有进行对齐操作,因此在解码每个元素的过程中,焦点分散在整个序列中。对于前面那个问题,LSTM、GRU 在一定程度能够缓解。而后者正是 Bahdanau 等人重视的问题。