[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.1 sequence models

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.2 notation

one-hot

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.3  Recurrent Neural Network Model

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

forward propagation

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.4 backpropagation through time

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question) 

1.5 different types of RNNs

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.6  Language model and sequence generation

rnn agriculture

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.7  Vanishing gradients with RNNs

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&question)

1.8 Gated Recurrent Unit

this prevent vanishing problem, for gamma u can be 0.000001 which leads to c<t> = c<t-1>

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

1.9 Long Short Term Memory (LSTM)

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)


LSTM in pictures

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)


1.10 Bidirectional RNN

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

1.11 Deep RNNs

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

The course in week1 simply tells what is NLP.

IF you want to leanr more, taking some papers to learn is better.


Q&A:

1. Question 1

Suppose your training examples are sentences (sequences of words). Which of the following refers to the jth word in the ith training example?

x<i>(j)

x(j)<i>

x<j>(i)

Question 2

2. Question 2

Consider this RNN:

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

This specific type of architecture is appropriate when:

Tx<Ty

Tx>Ty

Tx=1

Question 3

3. Question 3

To which of these tasks would you apply a many-to-one RNN architecture? (Check all that apply).

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

Question 4

4. Question 4

You are training this RNN language model.

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

At the tth time step, what is the RNN doing? Choose the best answer.

Estimating P(y<1>,y<2>,,y<t1>)

Estimating P(y<t>)

Estimating P(y<t>y<1>,y<2>,,y<t>)

Question 5

5. Question 5

You have finished training a language model RNN and are using it to sample random sentences, as follows:

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

What are you doing at each time step t?

(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as y^<t>. (ii) Then pass the ground-truth word from the training set to the next time-step.

(i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as y^<t>. (ii) Then pass this selected word to the next time-step.

(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as y^<t>. (ii) Then pass this selected word to the next time-step.

Question 6

6. Question 6

You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?

Exploding gradient problem.

ReLU activation function g(.) used to compute g(z), where z is too large.

Sigmoid activation function g(.) used to compute g(z), where z is too large.

Question 7

7. Question 7

Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations a<t>. What is the dimension of Γu at each time step?

1

300

10000

Question 8

8. Question 8

Here’re the update equations for the GRU.

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

Alice proposes to simplify the GRU by always removing the Γu. I.e., setting Γu = 1. Betty proposes to simplify the GRU by removing the Γr. I. e., setting Γr = 1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?

Alice’s model (removing Γu), because if Γr0 for a timestep, the gradient can propagate back through that timestep without much decay.

Alice’s model (removing Γu), because if Γr1 for a timestep, the gradient can propagate back through that timestep without much decay.

Betty’s model (removing Γr), because if Γu1 for a timestep, the gradient can propagate back through that timestep without much decay.

Question 9

9. Question 9

Here are the equations for the GRU and the LSTM:

[coursera/SequenceModels/week1]Recurrent Neural Networks (summary&amp;amp;question)

From these, we can see that the Update Gate and Forget Gate in the LSTM play a role similar to _______ and ______ in the GRU. What should go in the the blanks?

Γu and Γr

1Γu and Γu

Γr and Γu

Question 10

10. Question 10

You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x<1>,,x<365>. You’ve also collected data on your dog’s mood, which you represent as y<1>,,y<365>. You’d like to build a model to map from xy. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?

Bidirectional RNN, because this allows the prediction of mood on day t to take into account more information.

Unidirectional RNN, because the value of y<t> depends only on x<1>,,x<t>, but not on x<t+1>,,x<365>

Unidirectional RNN, because the value of y<t> depends only on x<t>, and not other days’ weather.