来源: EMNLP2014
原文

Motivation

目标：

Our main motivation is to provide a system for open QA able to be trained as
long as it has access to: (1) a training set of questions paired with answers and
(2) a KB providing a structure among answers.

假设：
问题的答案是知识图中的实体，并且问题也是由知识图中的实体组成的。当没有实体时，使用plain string matching method。

数据集

WebQuestions数据集（This dataset is built using Freebase as the KB and contains 5,810 question-answer pairs.）作为evaluation bemchmark。因为该数据集较小，因此还使用了其他来源的数据来训练。
Freebase

Freebase [3] is a huge and freely available database of general facts; data is organized as triplets ( subject , type1.type2.predicate, object ), where two entities subject and object (identified by mids) are connected by the relation type type1.type2.predicate .

把triples转化成问题，“What is the predicate of the type2 subject”。但没有使用全部的知识，只保留了实体在WebQuestions或CLUEWEB出现过的。所有出现次数小于5的实体也被删掉。

An example is “What is the nationality of the person barack obama ?” (united states ).

> we also created questions using ClueWeb extractions provided by[10]. Using string matching, we ended up with 2M extractions structured as( subject , “text string”, object ) with both subject and object linked to Freebase.
Paraphrases. 一个包含不同类别问题的数据集。

Method

$q$ 和 $a$ 分别表示问题和答案。socre function $S (q, a)$ generates a high score if $a$ is the correct answer to the question $q$ , and a low score otherwise. Note that both $q$ and $a$ are represented as a combination of the embeddings of their individual words and/or symbols;hence, learning $S$ essentially involves learning these embeddings:

S (q, a) = f (q)^{⊤} g (a)

也就是说通过

f (q)

和

g (a)

的dot product表示问题和答案之间的匹配度。

具体说来，有矩阵 $W \in R^{k \times N}$ , $k$ 是embedding size， $N = N_{W} + N_{S}$ 是字典的大小，其中， $N_{W}$ 表示单词的总数， $N_{S}$ 表示实体和关系类型的总数。 $W$ 第 $i$ 列就是embedding of the i-th element (word, entity or relation type) in the dictionary.
函数 $f (q) = W ϕ (q)$ 把问题 $q$ 映射到embedding space $R^{k}$ , $ϕ (q) \in N^{N}$ 是一个指示向量表示某个词出现的次数（没有考虑单词之间语义的联系，单词之间的顺序，影响大不大？）。相当于就是用 $q$ 中单词embedding的和来表示 $q$ .

Representing Candidate Answers

$g (a) = W ψ (a)$ 对答案的表示。其中， $ψ (a) \in N^{N}$ ，有三种计算方式：
1. Single Entity. The answer is represented as a single entity from Freebase. $ψ (a)$ is a 1-of- $N_{S}$ coded vector.
2. Path Representation. The answer is represented as a path from the entity mentioned in the question to the answer entity.作者只考虑1- or 2-hops的路径。这时 $ψ (a)$ 是一个3-of- $N_{S}$ 或者4-of- $N_{S}$ 的coded vector（路径中的实体不表示）.
3. Subgraph Representation. 方法同2，但加入了实体对应的C个实体和D个关系(候选答案实体子图)。同时为了区别路径表示和子图表示的不同，这里对实体和关系字典扩大一倍，即 $N = N_{W} + 2 N_{S}$ (??). 这时 $ψ (a)$ 是一个3+C+D或者4+C+D-of- $N_{S}$ 的coded vector.
这种做法基于假设：

Our hypothesis is that including more information about the answer in its representation will lead to improved results.

Question Answering with Subgraph Embeddings笔记

Training and Loss Function

Question Answering with Subgraph Embeddings笔记
要学习的参数就是 $W$ ，也就是word, entity or relation type的embedding. 其中头顶带帽的表示负样本，负样本怎么来的呢？本文使用的是构造负样本，一半是与问题实体相连的其它路径，另一半是随机选择的。

Multitask Training of Embeddings

作者还使用前面提到的Paraphrases数据集进行多任务训练，方法同上。目的是使同类的问题有更高的相似度。

Inference

显然，在测试阶段，问题的答案可以通过下式得到：
Question Answering with Subgraph Embeddings笔记
作者使用的测试集每个问题只包含一个可识别的FREEBASE实体. 所有和该实体直接相连的实体构成答案候选集 $C_{1}$ . 而考虑所有与问题中实体两跳相连的实体候选实体过多，作者使用beamsearch的方法，top10的1跳候选实体才考虑其2跳实体。这个候选集记为 $C_{2}$ ，默认使用这种方法。

Experiment

略

Our results also verify our hypothesis of Section 3.1, that a richer representation for answers (using the local subgraph) can store more pertinent information.

结论与思考

本文在几乎不需要任何手工定义的特征（hand- crafted features），也不需要借助词汇映射表，词性标注，依存树等条件下取得了当时很好的效果。
创新点：丰富了答案信息的表达，大大提升了基于深度学习的知识库问答的效果。
思考：既然更丰富的答案表达可以提升效果，那对问题有更好的表达会不会提升效果？