文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

论文地址:https://arxiv.org/abs/1710.02971

简介:

本文由微软和清华合作,在2018年发表,文章将DeepWalk, LINE, PTE, and node2vec算法在MF框架下进行了统一,在此基础上提出了NetMF算法

背景:

各种network embedding models很流行,但是对这些方法背后的理论分析却很少,之前的一些工作有:

Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In NIPS. 2177–218

(解释了skip-gram中负采样是对word-content矩阵的隐式分解)

Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2016. A latent variable model approach to pmi-based word embeddings. TACL 4 (2016), 385–399

Tatsunori B Hashimoto, David Alvarez-Melis, and Tommi S Jaakkola. 2016. Word embeddings as metric recovery in semantic spaces. TACL 4 (2016), 273–286

(解释了 the word embedding models from geometric perspectives)

但是对word-context matrix 和 network structure的连接的分析仍然很少,对这类算法背后的联系的挖掘也不够。

贡献:

1. 提出 DeepWalk, LINE, PTE, and node2vec—are in theory performing implicit matrix factorization

2. LINE can be seen as a special case of DeepWalk, when the window size T of contexts is set to 1 in skip-gram

3. PTE, as an extension of LINE, is actually an implicit factorization of the joint matrix of multiple networks

4. Discover a theoretical connection between DeepWalk’s implicit matrix and graph Laplacians,在此基础上提出NetMF

方法简介:

1.LINE

LINE的目的是对一个G=(V, E,A)的无向图,学习得到两个表征矩阵X和Y,其中

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

The objective of LINE (2nd) is to maximize:

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

g 是sigmoid function

b 是负采样系数

PN是用于负采样的 noise distribution

2. PTE

PTE是LINE在 heterogeneous text network的延伸,对于一个G = (V1∪V2,E,A) ,其任务是找到一个

xi表示vi属于V1, yj表示vj属于V2,其优化目标如下:

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

3. DeepWalk

先用random walk生成一系列的“corpus”再用skip-gram对“corpus”进行embedding,文章中给出了DeepWalk的基本步骤:

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

4. Node2vec

与Deep walk类似,加入了2度节点的转移概率计算方式,将2度节点加入random walk,对G = (V,E,A)

(u,v,w)都是G的节点,E为边集合

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

结论:

作者通过数学推导证明如下结论,过程太多不贴了

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

NetMF:

先对输入矩阵M进行一个预处理后再进行MF,其中T为窗口大小

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

文章还分别对大窗口和小窗口设计了不同的优化方式:

文论瞎读:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec