【中英】【吴恩达课后测验】Course 5 - 序列模型 - 第一周测验
【中英】【吴恩达课后测验】Course 5 -序列模型 - 第一周测验 - 循环神经网络
-
假设你的训练样本是句子(单词序列),下面哪个选项指的是第个训练样本中的第个词?
【★】
【 】
【 】
【 】
We index into the row first to get the training example (represented by parentheses), then the column to get the word (represented by the brackets).
首先获取第个训练样本(用括号表示),然后到 列获取单词(用括尖括号表示)。 -
看一下下面的这个循环神经网络:
在下面的条件中,满足上图中的网络结构的参数是:【★】
【 】
【 】
【 】
It is appropriate when every input should be matched to an output.
上图中每一个输入都与输出相匹配。 -
这些任务中的哪一个会使用多对一的RNN体系结构?
【 】 语音识别(输入语音,输出文本)。
【★】 情感分类(输入一段文字,输出0或1表示正面或者负面的情绪)。
【 】 图像分类(输入一张图片,输出对应的标签)。
【★】 人声性别识别(输入语音,输出说话人的性别)。
-
假设你现在正在训练下面这个RNN的语言模型:
在时,这个RNN在做什么?【 】 计算
【 】 计算
【★】 计算
-
【 】 计算
Yes,in a language model we try to predict the next step based on the knowledge of all prior steps.
是的,这个语言模型正在试着根据前面所有的知识来预测下一步。
-
你已经完成了一个语言模型RNN的训练,并用它来对句子进行随机取样,如下图:
在每个时间步都在做什么?【 】 (1)使用RNN输出的概率,选择该时间步的最高概率单词作为,(2)然后将训练集中的正确的单词传递到下一个时间步。
【 】 (i)使用由RNN输出的概率将该时间步的所选单词进行随机采样作为 ,(2)然后将训练集中的实际单词传递到下一个时间步。
【 】 (1)使用由RNN输出的概率来选择该时间步的最高概率词作为 ,(2)然后将该选择的词传递给下一个时间步。
【★】 (1)使用RNN该时间步输出的概率对单词随机抽样的结果作为,(2)然后将此选定单词传递给下一个时间步。
-
你正在训练一个RNN网络,你发现你的权重与**值都是“NaN”,下列选项中,哪一个是导致这个问题的最有可能的原因?
【 】 梯度消失。
【★】 梯度爆炸。
【 】 ReLU函数作为**函数g(.),在计算g(z)时,z的数值过大了。
【 】 Sigmoid函数作为**函数g(.),在计算g(z)时,z的数值过大了。
-
假设你正在训练一个LSTM网络,你有一个10,000词的词汇表,并且使用一个**值维度为100的LSTM块,在每一个时间步中,的维度是多少?
【 】 1
【★】 100
【 】300
【 】 10000
Correct, is a vector of dimension equal to the number of hidden units in the LSTM.
的向量维度等于LSTM中隐藏单元的数量。 -
这里有一些GRU的更新方程:
爱丽丝建议通过移除 来简化GRU,即设置=1。贝蒂提出通过移除来简化GRU,即设置=1。哪种模型更容易在梯度不消失问题的情况下训练,即使在很长的输入序列上也可以进行训练?【 】 爱丽丝的模型(即移除),因为对于一个时间步而言,如果,梯度可以通过时间步反向传播而不会衰减。
【 】 爱丽丝的模型(即移除),因为对于一个时间步而言,如果,梯度可以通过时间步反向传播而不会衰减。
【★】 贝蒂的模型(即移除),因为对于一个时间步而言,如果,梯度可以通过时间步反向传播而不会衰减。
【 】 贝蒂的模型(即移除),因为对于一个时间步而言,如果,梯度可以通过时间步反向传播而不会衰减。
For the signal to backpropagate without vanishing, we need to be highly dependant on
要使信号反向传播而不消失,我们需要 高度依赖于。 -
这里有一些GRU和LSTM的方程:
从这些我们可以看到,在LSTM中的更新门和遗忘门在GRU中扮演类似 与的角色,空白处应该填什么?【★】 与 1−
【 】 与
【 】 1− 与
【 】 与
-
你有一只宠物狗,它的心情很大程度上取决于当前和过去几天的天气。你已经收集了过去365天的天气数据,这些数据是一个序列,你还收集了你的狗心情的数据,你想建立一个模型来从x到y进行映射,你应该使用单向RNN还是双向RNN来解决这个问题?
【 】 双向RNN,因为在日的情绪预测中可以考虑到更多的信息。
【 】 双向RNN,因为这允许反向传播计算中有更精确的梯度。
【★】 单向RNN,因为的值仅依赖于,而不依赖于。
【 】 单向RNN,因为的值只取决于,而不是其他天的天气。
Recurrent Neural Networks
-
Suppose your training examples are sentences (sequences of words). Which of the following refers to the jth word in the ith training example?
- [x]
- [ ]
- [ ]
- [ ]
We index into the row first to get the training example (represented by parentheses), then the column to get the word (represented by the brackets).
-
Consider this RNN:
This specific type of architecture is appropriate when:- [x]
- [ ]
- [ ]
- [ ]
It is appropriate when every input should be matched to an output.
-
To which of these tasks would you apply a many-to-one RNN architecture? (Check all that apply).
- [ ] peech recognition (input an audio clip and output a transcript)
- [x] Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment)
- [ ] Image classification (input an image and output a label)
- [x] Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender)
-
You are training this RNN language model.
At the time step, what is the RNN doing? Choose the best answer.- [ ] Estimating
- [ ] Estimating
- [x] Estimating
- [ ] Estimating
Yes,in a language model we try to predict the next step based on the knowledge of all prior steps.
-
You have finished training a language model RNN and are using it to sample random sentences, as follows:
What are you doing at each time step t?- [ ] (i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as . (ii) Then pass the ground-truth word from the training set to the next time-step.
- [ ] (i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as . (ii) Then pass the ground-truth word from the training set to the next time-step.
- [ ] (i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as . (ii) Then pass this selected word to the next time-step.
- [x] (i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as . (ii) Then pass this selected word to the next time-step.
-
You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?
- [ ] Vanishing gradient problem.
- [x] Exploding gradient problem.
- [ ] ReLU activation function g(.) used to compute g(z), where z is too large.
- [ ] Sigmoid activation function g(.) used to compute g(z), where z is too large.
-
Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations a. What is the dimension of Γu at each time step?
- [ ] 1
- [x] 100
- [ ] 300
- [ ] 10000
Correct, is a vector of dimension equal to the number of hidden units in the LSTM.
-
Here’re the update equations for the GRU.
Alice proposes to simplify the GRU by always removing the . I.e., setting = 1. Betty proposes to simplify the GRU by removing the . I. e., setting = 1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?- [ ] Alice’s model (removing ), because if ≈0 for a timestep, the gradient can propagate back through that timestep without much decay.
- [ ] Alice’s model (removing ), because if ≈1 for a timestep, the gradient can propagate back through that timestep without much decay.
- [x] Betty’s model (removing ), because if ≈0 for a timestep, the gradient can propagate back through that timestep without much decay.
- [ ] Betty’s model (removing ), because if ≈1 for a timestep, the gradient can propagate back through that timestep without much decay.
Yes, For the signal to backpropagate without vanishing, we need to be highly dependant on
-
Here are the equations for the GRU and the LSTM:
From these, we can see that the Update Gate and Forget Gate in the LSTM play a role similar to _ and __ in the GRU. What should go in the the blanks?- [x] and 1−
- [ ] and
- [ ] 1− and
- [ ] and
-
You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as . You’ve also collected data on your dog’s mood, which you represent as . You’d like to build a model to map from x→y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?
- [ ] Bidirectional RNN, because this allows the prediction of mood on day t to take into account more information.
- [ ] Bidirectional RNN, because this allows backpropagation to compute more accurate gradients.
- [x] Unidirectional RNN, because the value of depends only on , but not on
- [ ] Unidirectional RNN, because the value of depends only on , and not other days’ weather.