您的位置: 首页 > 文章 > Attention笔记 Attention笔记 分类: 文章 • 2024-01-27 09:47:40 X∈Rls×dm, Y∈Rlt×dm, W∈Rdm×dhX\in \mathbb{R}^{l_s\times d_m},\ Y\in \mathbb{R}^{l_t \times d_m},\ W\in \mathbb{R}^{d_m\times d_h}X∈Rls×dm, Y∈Rlt×dm, W∈Rdm×dhE=tanh(Wa⋅X+b1)∈Rls×dhE=\tanh(W_a\cdot X+b_1)\in\mathbb{R}^{ls\times d_h}E=tanh(Wa⋅X+b1)∈Rls×dhQ=Wb⋅Y∈Rlt×dhQ=W_b \cdot Y\in \mathbb{R}^{l_t\times d_h}Q=Wb⋅Y∈Rlt×dhA=E⋅QT∈Rls×ltA=E\cdot Q^T\in \mathbb{R}^{l_s\times l_t}A=E⋅QT∈Rls×ltY′=LSTM(A)∈Rlt×dhY'=LSTM(A)\in \mathbb{R}^{l_t\times d_h}Y′=LSTM(A)∈Rlt×dhL=−YlogY′\mathcal{L}=-Y\log Y'L=−YlogY′