deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

序列模型
吴恩达 Andrew Ng

Why sequence models

Examples

Speech recognition, Music generation, Sentiment classification, DNA sequence analysis, Machine translation, Video activity recognition, Name entity recognition

Notation

  • X(i)<t>: 第i个输入样本的第t个元素

  • TX(i): 第i个输入样本的长度

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

  • 建立字典(单词的列向量),使用one-hot表示单词位置

  • UNK: unknown word, 表示不在字典里的词

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Recurrent Neural Network Model

  • Inputs and outputs can be different lengths in different examples

    每个样本的输入输出维度不固定

  • at each time-step, RNN passes on activation to the next time-step

  • 从左到右依次扫描参数

  • 每个时间步采用的是相同的参数Wax,Waa,Wya

  • 只使用了之前的信息来做出预测

  • BRNN,双向循环神经网络

  • a<0>=0 , a<1>=g1(Waaa<0>+Waxx<1>+ba) , y^<1>=g2(Wyaa<1>+by)

  • **函数 g1常用 tanhg2常用 sigmoid,softmax

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Backpropagation through time

deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)
deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Different types of RNNs

deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Language model and sequence generation

  • corpus 语料库、tokenize 标记、End Of Sentence

  • y^<1> 输出第一个词是XX的概率
    deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

  • 给定前面的词,预测下一个词是什么

Sampling novel sequences 新序列采样

  • 训练一个序列模型之后,要想了解到这个模型学到了什么,一种非正式的方法就是进行一次新序列采样

  • character language model, word level language model

  • 基于词汇的语言模型可以捕捉长范围的关系,基于字符的语言模型略逊一筹,并且训练成本比较高昂

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Vanishing gradients with RNNs

  • The basic RNN models are not good at capturing very long-term dependency.
  • local influences 局部影响
  • gradient clipping 梯度修剪,用于解决梯度爆炸,大于某个值时就进行缩放

Gated Recurrent Unit (GRU) 门控循环单元

  • c, memory cell, c~<t>=tanh(Wc[c<t1>,x<t>]+bc) , c<t>=x<t>

  • Γu=σ(Wu[c<t1>,x<t>]+bu) , update gate, this gate value is between 0 and 1

  • gate decides when to update c, c<t>=Γuc~<t>+(1Γu)c<t1> , element-wise multiplication

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)
  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Long Short Term Memory (LSTM) 长短期记忆

  • update, forget, output
    deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

  • peephole connection 窥探孔连接
    deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

Bidirectional RNN

  • combine ​information from the past, the present and the future

  • deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)

    图中的前向传播一部分计算是从左到右,一部分计算是从右到左

  • 对于大量自然语言处理问题,LSTM 单元的双向 RNN 模型是用的最多的

  • need the entire sequence of data before making predictions

Deep RNNs

a[l]<t>: layer l, at time t, activation value

deeplearning.ai - 循环神经网络 (Recurrent Neural Networks)