【中英】【吴恩达课后测验】Course 5 - 序列模型 - 第三周测验 - 序列模型与注意力机制
【中英】【吴恩达课后测验】Course 5 - 序列模型 - 第三周测验 - 序列模型与注意力机制
-
想一想使用如下的编码-解码模型来进行机器翻译:
这个模型是“条件语言模型”,编码器部分(绿色显示)的意义是建模中输入句子x的概率- 正确
- 错误
-
在集束搜索中,如果增加集束宽度,以下哪一项是正确的?
- 集束搜索将运行的更慢。
- 集束搜索将使用更多的内存。
- 集束搜索通常将找到更好地解决方案(比如:在最大化概率)上做的更好)。
- 集束搜索将在更少的步骤后收敛。
-
在机器翻译中,如果我们在不使用句子归一化的情况下使用集束搜索,那么算法会输出过短的译文。
- 正确
- 错误
-
假设你正在构建一个能够让语音片段转为译文的基于RNN模型的语音识别系统,你的程序使用了集束搜索来试着找寻最大的的值。在开发集样本中,给定一个输入音频,你的程序会输出译文 “I’m building an A Eye system in Silly con Valley.”,人工翻译为 “I’m building an AI system in Silicon Valley.”
在你的模型中,
那么,你会增加集束宽度来帮助修正这个样本吗?
-
不会,因为 说明了这个锅要丢给RNN,不能让搜索算法背锅。
-
不会,因为 说明了这个锅要丢给搜索算法,凭什么让RNN背锅?
-
会的,因为 说明了都是RNN的错,咱不能冤枉搜索算法。
-
会的,因为 说明了千错万错都是搜索算法的错,可不能惩罚RNN啊~
博主注:皮这一下好开心~(~ ̄▽ ̄)~
-
-
接着使用第4题那里的样本,假设你花了几周的时间来研究你的算法,现在你发现,对于绝大多数让算法出错的例子而言,,这表明你应该将注意力集中在改进搜索算法上,对吗?
- 嗯嗯~
- 不对
-
回想一下机器翻译的模型:
除此之外,还有个公式下面关于 的选项那个(些)是正确的?
- 对于网络中与输出高度相关的 而言,我们通常希望 的值更大。(请注意上标)
- 对于网络中与输出高度相关的 而言,我们通常希望 的值更大。(请注意上标)
- (注意是和除以t.)
- (注意是和除以t′.)
-
网络通过学习的值来学习在哪里关注“关注点”,这个值是用一个小的神经网络的计算出来的:
这个神经网络的输入中,我们不能将 替换为。这是因为依赖于,而又依赖于;所以在我们需要评估这个网络时,我们还没有计算出。
- 正确
- 错误
-
与题1中的编码-解码模型(没有使用注意力机制)相比,我们希望有注意力机制的模型在下面的情况下有着最大的优势:
- 输入序列的长度比较大。
- 输入序列的长度比较小。
9.在CTC模型下,不使用"空白"字符(_)分割的相同字符串将会被折叠。那么在CTC模型下,以下字符串将会被折叠成什么样子?__c_oo_o_kk___b_ooooo__oo__kkk
-
cokbok
- cookbook
- cook book
- coookkboooooookkk
- 在触发词检测中, 是:
- 时间时的音频特征(就像是频谱特征一样)。
- 第个输入字,其被表示为一个独热向量或者一个字嵌入。
- 是否在第时刻说出了触发词。
- 是否有人在第时刻说完了触发词。
Sequence models & Attention mechanism
- Consider using this encoder-decoder model for machine translation.

This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence .
- [x] True
- [ ] False
- In beam search, if you increase the beam width BB, which of the following would you expect to be true? Check all that apply.
- Beam search will run more slowly.
- Beam search will use up more memory.
- Beam search will generally find better solutions (i.e. do a better job maximizing P(y \mid x)P(y∣x))
- Beam search will converge after fewer steps.
- In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.
- True
- False
-
Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip to a text transcript . Your algorithm uses beam search to try to find the value of that maximizes .
On a dev set example, given an input audio clip, your algorithm outputs the transcript “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript “I’m building an AI system in Silicon Valley.”.
According to your model,
Would you expect increasing the beam width B to help correct this example?- No, because indicates the error should be attributed to the RNN rather than to the search algorithm.
- No, because indicates the error should be attributed to the search algorithm rather than to the RNN.
- Yes, because indicates the error should be attributed to the RNN rather than to the search algorithm.
- Yes, because indicates the error should be attributed to the search algorithm rather than to the RNN.
- Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, . This suggest you should focus your attention on improving the search algorithm.
- True
- False
- Consider the attention model for machine translation.
Further, here is the formula for .
Which of the following statements about are true? Check all that apply.
- We expect to be generally larger for values of that are highly relevant to the value the network should output for . (Note the indices in the superscripts.)
- We expect to be generally larger for values of that are highly relevant to the value the network should output for . (Note the indices in the superscripts.)
- (Note the summation is over .)
- (Note the summation is over .)
-
The network learns where to “pay attention” by learning the values e<t,t′>, which are computed using a small neural network:
We can’t replace with as an input to this neural network. This is because depends on which in turn depends on ; so at the time we need to evalute this network, we haven’t computed yet.- True
- False
- Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:
- The input sequence length is large.
- The input sequence length is small.
- Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to? __c_oo_o_kk___b_ooooo__oo__kkk
- cokbok
- cookbook
- cook book
- coookkboooooookkk
- In trigger word detection, is:
- Features of the audio (such as spectrogram features) at time .
- The -th input word, represented as either a one-hot vector or a word embedding.
- Whether the trigger word is being said at time .
- Whether someone has just finished saying the trigger word at time .