Reasoning with Sarcasm by Reading In-between

Reasoning with Sarcasm by Reading In-between

click here:文章下载

方法综述:

本文提出了新的模型SIARN(Singal-dimensional Intra-
Attention Recurrent Networks)和MIARN(Multi-dimensional Intra-
Attention Recurrent Networks)。

先给出一个定义,关系得分si,js_{i,j}表示单词wiw_iwjw_j间的信息关联程度。二者的区别仅在于,SIARN中只考虑单词对间的一种内在关系,si,js_{i,j}是个标量;而MIARN考虑单词对间的多种(k种)内在关系,si,js_{i,j}是个k维向量,再将其融合为一个标量。

模型中包含三个子模型:Singal/Multi-dimensional Intra-AttentionLSTMPrediction Layer
Singal/Multi-dimensional Intra-Attention:通过单词对间的信息,得到句子的Intra-Attentive Representation
LSTM:通过句子的序列信息,得到句子的Compositional Representation
Prediction Layer: 融合两种信息表示,进行二分类预测
Reasoning with Sarcasm by Reading In-between

各模型算法:

Singal/Multi-dimensional Intra-Attention

Sigal-dimensional:

si,j=Wa([wi;wj])+ba    si,jRs_{i,j}=W_a([w_i;w_j])+b_a \implies s_{i,j} \in R 标量

WaR2n×1,baR;W_a \in R^{2n \times 1},b_a \in R;

Multi-dimensional:

si,j^=Wq([wi;wj])+bq    si,j^Rk\hat{s_{i,j}}=W_q([w_i;w_j])+b_q \implies \hat{s_{i,j}} \in R^k k维向量
WqR2n×k,bqRk;W_q \in R^{2n \times k},b_q \in R^k;

si,j=Wp(ReLU(si,j^))+bps_{i,j}=W_p(ReLU(\hat{s_{i,j}}))+b_p
WpRk×1,bpR;W_p \in R^{k \times 1},b_p \in R;

\Downarrow \Downarrow \Downarrow \Downarrow \Downarrow \Downarrow \Downarrow \Downarrow \Downarrow \Downarrow

si,j=Wp(ReLU(Wq([wi;wj])))+bps_{i,j}=W_p(ReLU(W_q([w_i;w_j])))+b_p
WqR2n×k,bqRk,WpRk×1,bpR;W_q \in R^{2n \times k},b_q \in R^k,W_p \in R^{k \times 1},b_p \in R;

从而,对于长度为ll的句子,可以得到对称矩阵sRl×ls \in R^{l \times l}
对矩阵s进行row-wise max-pooling,即按行取最大值,得到attention vectoraRla \in R^l
Reasoning with Sarcasm by Reading In-between
有了权重向量a,便可以对句子单词进行加权求和,得到Intra-Attentive RepresentationvaRnv_a \in R^n:
Reasoning with Sarcasm by Reading In-between

LSTM

LSTM的每个时间步输出hiRdh_i \in R^d,可以表示为:

hi=LSTM(w,i),i[1,...,l]h_i=LSTM(w,i),\forall i \in [1,...,l]

本文使用LSTM的最后时间步输出,作为Compositional RepresentationvcRdv_c \in R^d

vc=hlv_c=h_l

dd是LSTM隐藏层单元数,ll是句子的最大长度。

Prediction Layer

融合上述得到的Intra-Attentive Representation vaRnv_a \in R^nCompositional Representation vcRdv_c \in R^d,得到融合表示向量 vRdv \in R^d,再进行二分类输出y^R2\hat{y} \in R^2:

v=ReLU(Wz([va;vc])+bz)v=ReLU(W_z([v_a;v_c]) + b_z)
y^=Softmax(Wfv+bf)\hat{y}=Softmax(W_fv+b_f)

其中,WzR(d+n)×d,bzRd,WfRd×2,WfRd×2,bfR2W_z \in R^{(d+n) \times d},b_z \in R^d,W_f \in R^{d \times 2},W_f \in R^{d \times 2}, b_f \in R^2

训练目标:

Reasoning with Sarcasm by Reading In-between
Reasoning with Sarcasm by Reading In-between
待学习参数:θ={Wp,bp,Wq,bq,Wz,bz,Wf,bf}\theta = \{W_p,b_p,W_q,b_q,W_z,b_z,W_f,b_f\}
超参数:k,n,d,λk, n, d, \lambda