RNN变体之dropout

问题

RNN在迭代运用状态转换操作“输入到隐状态”实现任意长序列的定长表示时,会遭遇到“对隐状态扰动过于敏感”的困境。


dropout

dropout的数学形式化:

  • y=f(Wd(x)), 其中d(x)={maskx, if train phaseing(1p)x,otherwise
    p为dropout率,mask为以1-p为概率的贝努力分布生成的二值向量

rnn dropout

改变传统做法“在每个时间步采用不同的mask对隐节点进行屏蔽”,提出新的策略(如下图所示),其特点是:1)generates the dropout mask only at the beginning of each training sequence and fixes it through the sequence;2)dropping both the non-recurrent and recurrent connections。
RNN变体之dropout
Moon T, Choi H, Lee H, et al. RNNDROP: A novel dropout for RNNS in ASR[C]// Automatic Speech Recognition and Understanding. IEEE, 2016:65-70.

recurrent dropout

  • 思想:通过dropout LSTM/GRU中的input或update门以prevents the loss of long-term memories built up in the states/cells 。

简单RNN及其dropout:
RNN: ht=f(Wh[xt,ht1]+bh);
dropout: ht=f(Wh[xt,d(ht1)]+bh), d()为dropout函数
LSTM: ct=ftct1+itd(gt)
GRU: ht=(1zt)ct1+ztd(gt)
从理论上讲, masks can be applied to any subset of the gates, cells, and states.
文献:Semeniuta S, Severyn A, Barth E. Recurrent Dropout without Memory Loss[J]. 2016.


垂直连接的dropout

针对多层LSTM网络,对其垂直连接进行随机dropout, 也即是否允许L层某个LSTM单元的隐状态信息流入L+1层对应单元
RNN变体之dropout
图中虚线是进行随机dropout的操作对象。
RNN变体之dropout
dropout操作后的信息流
文献: Zaremba W, Sutskever I, Vinyals O. Recurrent Neural Network Regularization[C]. ICLR 2015.
源码:https://github.com/wojzaremba/lstm .


基于变分推理的dropout

RNN变体之dropout
图中虚线代表不进行dropout,而不同颜色的实线表示不同的mask。
传统dropout rnn: use different masks at different time steps
基于变分推理的dropout: uses the same dropout mask at each time step, including the recurrent layers
基于变分推理的dropout的具体实现(上图(b)的实线颜色可知):为每个连接矩阵一次性生成贝努力随机变量的mask,然后在后续的每个时间点上都采用相同的mask.
文献:Gal Y. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks[J]. Statistics, 2015:285-290.
源码: http://yarin.co/BRNN

Zoneout

RNN变体之dropout

  • ct=dctct1+(1dct)ftct1+itgt

  • ht=dhtht1+(1dht)ottanh(ct1ft+itgt)
    其中dht为0与1的二值随机向量。

文献:Krueger D, Maharaj T, Kramár J, et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations[C]. ICLR 2017
源码:http://github.com/teganmaharaj/zoneout