论文《Aspect Level Sentiment Classification with Deep Memory Network》总结

Aspect Level Sentiment Classification with Deep Memory Network

论文来源:Tang, D., Qin, B., & Liu, T. (2016). Aspect level sentiment classification with deep memory network. arXiv preprint arXiv:1605.08900.

原文链接:http://blog.csdn.net/rxt2012kc/article/details/73770408

advantages

Neural models are of growing interest for their capacity to learn text representation from data without careful engineering of features, and to capture semantic relations between aspect and context words in a more scalable way than feature based SVM.

disadvantage

Despite these advantages, conventional neural models like long short-term memory (LSTM) (Tang et al., 2015a) capture context information in an implicit way, and are incapable of explicitly exhibiting important context clues of an aspect.

Standard LSTM works in a sequential way and manipulates each context word with the same operation, so that it cannot explicitly reveal the importance of each context word.

cross-entropy

As every component is differentiable, the entire model could be efficiently trained end-to- end with gradient descent, where the loss function is the cross-entropy error of sentiment classification.

目标词为单个词,如果是多个词的处理如下:

For the case where aspect is multi word expression like “battery life”, aspect represen- tation is an average of its constituting word vectors (Sun et al., 2015).

dataset

  • laptop and restaurant datasets

We apply the proposed approach to laptop and restaurant datasets from SemEval 2014 (Pontiki et al., 2014).

steps

  • input

Given a sentence s = {w1, w2, …, wi, …wn} and the aspect word wi, we map each word into its em- bedding vector. These word vectors are separated into two parts, aspect representation and context rep- resentation. If aspect is a single word like “food” or “service”, aspect representation is the embedding of aspect word.

Context word vectors {e1 , e2 … ei 1 , ei+1 … en } are stacked and regarded as the external memory m 2 Rd⇥(n 1), where n is the sentence length.

  • step1
    论文《Aspect Level Sentiment Classification with Deep Memory Network》总结

In the first computational layer (hop 1), we regard aspect vector as the input to adaptively select important evidences from memory m through attention layer.

The output of attention layer and the linear transformation of aspect vector2 are summed and the result is considered as the input of next layer (hop 2).

It is helpful to note that the parameters of attention and linear layers are shared in different hops. There- fore, the model with one layer and the model with nine layers have the same number of parameters.

attention model

The basic idea of attention mechanism is that it assigns a weight/importance to each lower position when computing an upper level representation (Bahdanau et al., 2015).

In this work, we use attention model to compute the representation of a sentence with re- gard to an aspect.

Furthermore, the importance of a word should be different if we focus on different aspect. Let us again take the example of “great food but the service was dreadful!”. The context word “great” is more important than “dreadful” for aspect “food”. On the contrary, “dreadful” is more important than “great” for aspect “service”.

计算每个词的权重,通过一个gi = tanh(Watt[mi; vaspect] + batt),gi为1*1的值,最终得到[g1,g2,g3,g4,gk],1*k的向量,再经过一个softmax,计算出每个词的权重,然后分别和每个记忆相乘再相加,最终得到一个d*1的矩阵,作为attention model的输出。

location attention

Such location information is helpful for an attention model because intuitively a context word closer to the aspect should be more important than a farther one.

In this work, we define the location of a context word as its absolute distance with the aspect in the original sentence sequence3.

vi =1 li/n, li代表元素的位置,n代表句子的长度。

the need for multiple hops

Multiple computational layers allow the deep memory network to learn representations of text with multiple levels of abstraction. Each layer/hop retrieves important context words, and transforms the representation at previous level into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions of sentence representation towards an aspect can be learned.

cross entropy

The model is trained in a supervised manner by minimizing the cross entropy error of sentiment classification,

share the same parameters

It is helpful to note that the parameters of attention and linear layers are shared in different hops. There- fore, the model with one layer and the model with nine layers have the same number of parameters.