Attention笔记

Attention笔记
XRls×dm, YRlt×dm, WRdm×dhX\in \mathbb{R}^{l_s\times d_m},\ Y\in \mathbb{R}^{l_t \times d_m},\ W\in \mathbb{R}^{d_m\times d_h}
E=tanh(WaX+b1)Rls×dhE=\tanh(W_a\cdot X+b_1)\in\mathbb{R}^{ls\times d_h}
Q=WbYRlt×dhQ=W_b \cdot Y\in \mathbb{R}^{l_t\times d_h}
A=EQTRls×ltA=E\cdot Q^T\in \mathbb{R}^{l_s\times l_t}
Y=LSTM(A)Rlt×dhY'=LSTM(A)\in \mathbb{R}^{l_t\times d_h}
L=YlogY\mathcal{L}=-Y\log Y'