Towards Understanding the Geometry of Knowledge Graph Embeddings理解

Chandrahas et al. 2018 ACL.

论文背景

知识图谱的embedding方式有多种，可按得分函数(score function)的计算表达式分为“加法模型”和“乘法模型”两类（见下图1）。本文提出新的度量指标，研究各种embedding方式的几何(geometry)性质。
Towards Understanding the Geometry of Knowledge Graph Embeddings理解
图1：知识图谱embedding得分函数分类

度量指标

ATM(alignment to mean): the cosine similarity between $v$ and the mean of all vectors in $V$ . $ATM(v, V) = cosine(v, \frac{1}{|V|}\sum_{x\in V}x)$
Conicity: the mean ATM of all vectors in $V$ . $Conicity(V) = \frac{1}{|V|}\sum_{v\in V}ATM(v, V)$
VS(vector spread): the variance of ATM across all vectors in $V$ . $VS(v) = \frac{1}{|V|}\sum_{v\in V}(ATM(v, V) - Conicity(V))^2$
AVL(average vector length). $AVL(V) = \frac{1}{|V|}\sum_{v \in V}||v||_2$

实验结果

使用FB15K和WN18两个数据集，通过改变生成向量的维度和训练时负样本的数量，观察不同embedding方式的几何性质变化情况，并探讨几何性质与性能之间的相关性。

“加法模型”是低Conicity和高VS；而“乘法模型”恰恰相反。
“加法模型”的Conicity和AVL不随负样本大小的变化而变化；而“乘法模型”中“实体”的Concity随负样本数目增加而减小，AVL随负样本数目增加而增加，“关系”的Concity减小，但AVL保持不变。
“加法模型”的Conicity和AVL不随维度大小的变化而变化；而“乘法模型”中“实体”和“关系”的Conicty随维度增加而减小，AVL随维度增加而增加。
“加法模型”的几何性质和性能之间无相关性；而“乘法模型”固定负样本数目，“实体”的低Conicty和高AVL会提高性能，“关系”无相关性。

Towards Understanding the Geometry of Knowledge Graph Embeddings理解

相关推荐