(2020)Deep Joint Entity Disambiguation with Local Neural Attention论文笔记

paper: Deep Joint Entity Disambiguation with Local Neural Attention

https://github.com/dalab/deep-ed

文章目录：

Abstract

Introduction

Contributions and Related Work

Learning Entity Embeddings

Local Model with Neural Attention

Document-Level Deep Model

Experiments

Abstract

创新点：提出了一个基于深度学习的文档级别的实体消歧模型。
1. Entity embedding
2. 考虑上下文的attention机制
3. 用于消除歧义的联合推理（就是使用联合概率模型）
实验表现：在适当的算力代价下，取得了最好的表现

Introduction

什么是实体链接？ 所谓实体链接就是将文本的mention指向现有知识库中已存在实体的过程。如为匹配到实体，标记为NIL。
实体连接三个模块（该论文主要介绍候选实体排序）：
1. 候选实体生成
2. 候选实体排序
3. 无法链接预测
候选实体如何排序？
1. 局部实体消歧：关注上下文窗口范围内的文本信息
2. 全局实体消歧：关注整个文档中实体间的一致性

Contributions and Related Work

这里简单提及模型用到的相关技术

Entity Embedding
- 参考word2vec的word嵌入和某某的entity嵌入，但有所区别
- 区别：不用知道实体间的共现次数，通过实体页面和超链接标注的上下文内容
- 使用的是预训练的word和entity嵌入
Context Attention（局部模型）
- 重点关注上下文中，对消歧决策提供信息的words
Collective Disambiguation（全局模型）
- 使用条件随机场（CRF），由于是NP-hard问题，优化为循环信念传播（LBP）【这块说实话较为难懂】

Local Model with Neural Attention

这里就是局部模型的讲解

(2020)Deep Joint Entity Disambiguation with Local Neural Attention论文笔记

橙色框表示：可学习的嵌入矩阵A，B
红色框从左至右分别表示：word embedding和entity embedding

大致流程（可以结合图看）：

Mention-entity 先验 : $\hat{p}(e \mid m)$
每个mention生成的候选实体集合: $\Gamma(m)$ ，每个mention对应上下文词的集合: $c$
计算上下文中每个词的得分: $u(w)=\max _{e \in \Gamma(m)} \mathbf{x}_{e}^{\top} \mathbf{A} \mathbf{x}_{w}$
计算完后裁剪，得到 $topR$ 得分最高的词： $\bar{c}=\{w \in c \mid u(w) \in \operatorname{to} p R(\mathbf{u})\}$
使用softmax attention 权重： $\beta(w)=\left\{\begin{array}{ll}\frac{\exp [u(w)]}{\sum_{v \in \bar{c}} \exp [u(v)]} & \text { if } w \in \bar{c} \\ 0 & \text { otherwise }\end{array}\right.$
计算实体和mention上下文的得分： $\Psi(e, c)=\sum_{w \in \bar{c}} \beta(w) \mathbf{x}_{e}^{\top} \mathbf{B} \mathbf{x}_{w}$
最终拼接先验的得分： $\Psi(e, m, c)=f(\Psi(e, c), \log \hat{p}(e \mid m))$

损失函数构造：

使得正确的得分尽可能高，错误的得分尽可能低

$\theta^{*}=\arg \min _{\theta} \sum_{D \in \mathcal{D}} \sum_{m \in D} \sum_{e \in \Gamma(m)} g(e, m)$
$g(e, m):=\left[\gamma-\Psi\left(e^{*}, m, c\right)+\Psi(e, m, c)\right]_+$

Document-Level Deep Model

每个文档包含的mention ： $m = m_1,m_2,...m_n$ ，每个mention对应的上下文词： $c=c_1, c_2,...c_n$
定义联合概率分布，通过计算边缘概率为文档中的每个mention选择一个entity $\Gamma\left(m_{1}\right) \times \ldots \times \Gamma\left(m_{n}\right) \ni \mathbf{e}$
- 边缘概率：对无关变量求和或者积分

使用CRF模型

定义： $g(\mathbf{e}, \mathbf{m}, \mathbf{c})=\sum_{i=1}^{n} \Psi_{i}\left(e_{i}\right)+\sum_{i<j} \Phi\left(e_{i}, e_{j}\right)$ ，目的是最大化 $g(\mathbf{e}, \mathbf{m}, \mathbf{c})$ ，所谓的找到得分最高的实体。
- 其中： $\Phi\left(e, e^{\prime}\right)=\frac{2}{n-1} \mathbf{x}_{e}^{\top} \mathbf{C} \mathbf{x}_{e^{\prime}}$
由于训练和预测CRF模型是一个NP-hard问题（复杂度呈指数增长），使用循环信念神经网络【LBP】（在较低的复杂度下获得原问题的近似解）。
针对论文中所提到的信念网络以及消息传递（message passing），结点间具体如何进行消息传递，可以参考以下文章，个人觉得说的还可以了
- https://zhuanlan.zhihu.com/p/38172096

(2020)Deep Joint Entity Disambiguation with Local Neural Attention论文笔记

橙色框表示：局部得分
红色框表示：全局得分

消息更新规则

$\begin{aligned} m_{i \rightarrow j}^{t+1}(e)=\max _{e^{\prime} \in \Gamma\left(m_{i}\right)} &\left\{\Psi_{i}\left(e^{\prime}\right)+\Phi\left(e, e^{\prime}\right)\right.\\ &\left.+\sum_{k \neq j} \bar{m}_{k \rightarrow i}^{t}\left(e^{\prime}\right)\right\} \end{aligned}$
- mention $i$ 对mention $j$ 的投票： $\begin{aligned} \bar{m}_{i \rightarrow j}^{t}(e)=& \log \left[\delta \cdot \operatorname{softmax}\left(m_{i \rightarrow j}^{t}(e)\right)\right.\\ &+(1-\delta) \cdot \exp \left(\bar{m}_{i \rightarrow j}^{t-1}(e)\right) \end{aligned}$

大致流程（与局部方法类似）

经过T次迭代后的信念（边缘）： $\mu_{i}(e)=\Psi_{i}(e)+\sum_{k \neq i} \bar{m}_{k \rightarrow i}^{T}(e)$
归一化： $\bar{\mu}_{i}(e)=\frac{\exp \left[\mu_{i}(e)\right]}{\sum_{e^{\prime} \in \Gamma\left(m_{i}\right)} \exp \left[\mu_{i}\left(e^{\prime}\right)\right]}$
拼接先验计算全局得分： $\rho_{i}(e):=f\left(\bar{\mu}_{i}(e), \log \hat{p}\left(e \mid m_{i}\right)\right)$

损失函数构造（与局部方法类似）

使得正确的得分尽可能高，错误的得分尽可能低

$\begin{aligned} L(\theta) &=\sum_{D \in \mathcal{D}} \sum_{m_{i} \in D} \sum_{e \in \Gamma\left(m_{i}\right)} h\left(m_{i}, e\right) \\ h\left(m_{i}, e\right) &=\left[\gamma-\rho_{i}\left(e_{i}^{*}\right)+\rho_{i}(e)\right]_{+} \end{aligned}$

Experiments

数据集
实验SOTA效果

(2020)Deep Joint Entity Disambiguation with Local Neural Attention论文笔记

文章目录：

Abstract

Introduction

Contributions and Related Work

Learning Entity Embeddings

Local Model with Neural Attention

大致流程（可以结合图看）：

损失函数构造：

Document-Level Deep Model

使用CRF模型

消息更新规则

大致流程（与局部方法类似）

损失函数构造（与局部方法类似）

Experiments

相关推荐