关于知识图谱的trans系列一:transE

关于知识图谱的trans系列一:transE

论文链接
https://www.utc.fr/~bordesan/dokuwiki/_media/en/transe_nips13.pdf
参考链接
https://zhuanlan.zhihu.com/p/32993044
abstract:
Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities.
论文提出TransE模型学习词向量的思想将元组之间的关系嵌入到低维空间中去
introduction:
Multi-relational data refers to directed graphs whose nodes correspond to entities and edges of the form (head, label, tail) (denoted (h, l, t)) 文中用三元组表示实体与实体之间的关系(h,l,t) model: if (h, l, t) holds, then the embedding of the tail entity t should be close to the embedding of the head entity h plus some vector that depends on the relationship
模型的思想大致就是模型学习到的的实体h向量+关系向量l=实体t的向量
文中指出TransE模型的优势是其实学习的参数是很少的
This suggests that there may exist embedding spaces in which 1-to-1 relationships between entities of different types may, as well, be represented by translations
transE模型学习的是一对一的关系,不适合一对多或者多对一关系
Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2-norm
模型目标是要使得实体之间有关系的实体(即正样本)之间的d值尽可能小,而不具有关系的样实体之间(负样本)之间的d值尽可能大,d值可以采用 L1 or the L2-norm
所以最终的objective function为:
关于知识图谱的trans系列一:transE
这里采用的是SVM中的margin函数,目的是使模型能够尽可能的分开正负样本,关于负样本的构造,作者选择替换两个实体中的任一个(不是同时)作为负样本
最后作者实验的数据集是Wordnet与Freebase,数据可在此链接处下载https://everest.hds.utc.fr/doku.php?id=en:transe
关于知识图谱的trans系列一:transE