Graph Auto-encoder
1 Structural Deep Network Embedding
[Wang D, 2016, 1 ] 提出的SDNE使用自动编码器思想对图数据经行嵌入。原文提出了一阶、二阶近似,希望编码前后的顶点一阶、二阶近似关系保持不变。
First-Order Proximity The first-order proximity describes the pairwise proximity between vertexes. For any pair of vertexes, if a i j > 0 a_{ij} > 0 a i j > 0 , there exists positive first-order proximity between v i v_i v i and v j v_j v j . Otherwise, the first-order proximity between v i v_i v i and v j v_j v j is 0.
Second-Order Proximity The second-order proximity between a pair of vertexes describes the proximity of the pair’s neighborhood structure. Let N u = { a u , 1 , ⋯ , a u , ∣ V ∣ } \mathcal{N}_u = \{ a_{u,1}, \cdots, a_{u,|\mathcal{V}|} \} N u = { a u , 1 , ⋯ , a u , ∣ V ∣ } denote the first-order proximity between v u v_u v u and other vertexes. Then, secondorder proximity is determined by the similarity of N u \mathcal{N}_u N u and N v \mathcal{N}_v N v .
卷积:y ⃗ i ( 0 ) = x ⃗ i , y ⃗ i ( k ) = σ ( W ( k ) y ⃗ i ( k − 1 ) + b ⃗ ( k ) ) , k = 1 , ⋯ , K . (1.1)
\begin{aligned}
\vec{y}_i^{(0)} &= \vec{x}_i, \\
\vec{y}_i^{(k)} &= \sigma \left( W^{(k)} \vec{y}_i^{(k-1)} + \vec{b}^{(k)} \right), \quad k = 1, \cdots, K.
\end{aligned}
\tag{1.1}
y i ( 0 ) y i ( k ) = x i , = σ ( W ( k ) y i ( k − 1 ) + b ( k ) ) , k = 1 , ⋯ , K . ( 1 . 1 )
重构损失:L = ∑ i = 1 n ∥ x ⃗ ^ i − x ⃗ i ∥ 2 2 . (1.2)
\mathcal{L} = \sum_{i=1}^{n} \| \hat{\vec{x}}_i - \vec{x}_i \|_2^2.
\tag{1.2}
L = i = 1 ∑ n ∥ x ^ i − x i ∥ 2 2 . ( 1 . 2 )
一阶近似损失:L 1st = ∑ i , j = 1 n a i , j ∥ y ⃗ i ( K ) − y ⃗ j ( K ) ∥ 2 2 . (1.3)
\mathcal{L}_{\text{1st}} = \sum_{i,j=1}^{n} a_{i,j} \| \vec{y}_i^{(K)} - \vec{y}_j^{(K)} \|_2^2.
\tag{1.3}
L 1st = i , j = 1 ∑ n a i , j ∥ y i ( K ) − y j ( K ) ∥ 2 2 . ( 1 . 3 )
二阶近似损失:L 2nd = ∑ i = 1 n ∥ ( x ⃗ ^ i − x ⃗ i ) ⊙ b ⃗ i ∥ 2 2 , = ∥ ( X ^ − X ) ⊙ B ∥ F 2 . (1.4)
\begin{aligned}
\mathcal{L}_{\text{2nd}} &= \sum_{i=1}^{n} \| \left( \hat{\vec{x}}_i - \vec{x}_i \right) \odot \vec{b}_i \|_2^2,\\
&= \| \left( \hat{X} - X \right) \odot B \|_F^2.
\end{aligned}
\tag{1.4}
L 2nd = i = 1 ∑ n ∥ ( x ^ i − x i ) ⊙ b i ∥ 2 2 , = ∥ ( X ^ − X ) ⊙ B ∥ F 2 . ( 1 . 4 )
当a i , j = 0 a_{i,j}=0 a i , j = 0 时,b ⃗ i = 1 ⃗ \vec{b}_i = \vec{1} b i = 1 ;如果a i , j > 0 a_{i,j}>0 a i , j > 0 ,则b ⃗ i \vec{b}_i b i 是一个大于1的超参数。通过自编码器,如果两个节点具有相近的邻接点结构,则在表示空间中距离越近。
惩罚项:L reg = 1 2 ∑ k = 1 K ( ∥ W ( k ) ∥ F 2 + ∥ W ^ ( k ) ∥ F 2 ) . (1.5)
\mathcal{L}_{\text{reg}} = \frac{1}{2} \sum_{k=1}^{K} \left( \| W^{(k)} \|_F^2 + \| \hat{W}^{(k)} \|_F^2 \right).
\tag{1.5}
L reg = 2 1 k = 1 ∑ K ( ∥ W ( k ) ∥ F 2 + ∥ W ^ ( k ) ∥ F 2 ) . ( 1 . 5 )
总的损失为:L mix = L 2nd + α L 1st + γ L reg . (1.6)
\mathcal{L}_{\text{mix}} = \mathcal{L}_{\text{2nd}} + \alpha \mathcal{L}_{\text{1st}} + \gamma \mathcal{L}_{\text{reg}}.
\tag{1.6}
L mix = L 2nd + α L 1st + γ L reg . ( 1 . 6 )
2 Deep neural networks for learning graph representations
[Cao S, 2016, 2 ] DNGR结合了随机游走和深度自动编码器。该模型由3部分组成:随机游走、正点互信息(PPMI)计算和叠加去噪自编码器。
随机游走:
给定游走最大步长和邻居矩阵:p k = α p k − 1 A + ( 1 − α ) p 0 . (2.1)
p_k = \alpha p_{k-1} A + (1 - \alpha) p_0.
\tag{2.1}
p k = α p k − 1 A + ( 1 − α ) p 0 . ( 2 . 1 )
其中α \alpha α 是平滑参数。
PPMI矩阵:
用随机游走得到的概率共发生矩阵计算PPMI矩阵,PMI w , c = log ( # ( w , c ) ⋅ ∑ w , c # ( w , c ) # ( w ) ⋅ # ( v ) ) , X w , c = PPMI w , c = max ( PMI w , c , 0 ) . (2.2)
\begin{aligned}
\text{PMI}_{w,c} &= \log \left( \frac{\#(w,c) \cdot \sum_{w,c} \#(w,c)}{\#(w) \cdot \#(v)} \right), \\
X_{w,c} &= \text{PPMI}_{w,c} = \max (\text{PMI}_{w,c}, 0).
\end{aligned}
\tag{2.2}
PMI w , c X w , c = log ( # ( w ) ⋅ # ( v ) # ( w , c ) ⋅ ∑ w , c # ( w , c ) ) , = PPMI w , c = max ( PMI w , c , 0 ) . ( 2 . 2 )
用X X X 作自动编码器的输入。
损失为:min θ 1 , θ 2 ∑ i = 1 N L ( x ( i ) , g θ 2 ( f θ 1 ( x ~ ( i ) ) ) ) . (2.3)
\min_{\theta_1, \theta_2} \sum_{i=1}^{N} \mathcal{L} \left( x^{(i)}, g_{\theta_2} ( f_{\theta_1} ( \tilde{x}^{(i)} ) ) \right).
\tag{2.3}
θ 1 , θ 2 min i = 1 ∑ N L ( x ( i ) , g θ 2 ( f θ 1 ( x ~ ( i ) ) ) ) . ( 2 . 3 )
3 Variational Graph Auto-Encoders
[Kipf T, 2016, 3 ] 将VAE应用于图数据上。对于无权重图G = ( V , E ) \mathcal{G} = (\mathcal{V},\mathcal{E}) G = ( V , E ) ,A ∈ R N × N A \in \reals^{N \times N} A ∈ R N × N 是邻居矩阵,D ∈ R N × N D \in \reals^{N \times N} D ∈ R N × N 是度对角阵,X ∈ R N × D X \in \reals^{N \times D} X ∈ R N × D 是顶点的特征向量矩阵,Z ∈ R N × F Z \in \reals^{N \times F} Z ∈ R N × F 是编码后的特征向量矩阵。
Inference model主要的是生成X X X 的新表达,即编码。q ( Z ∣ X , A ) = ∏ i = 1 N q ( z ⃗ i ∣ X , A ) . (3.1)
q(Z | X, A) = \prod_{i=1}^{N} q(\vec{z}_i | X, A).
\tag{3.1}
q ( Z ∣ X , A ) = i = 1 ∏ N q ( z i ∣ X , A ) . ( 3 . 1 )
其中:q ( z ⃗ i ∣ X , A ) = N ( z ⃗ i ∣ μ ⃗ i , diag ( σ ⃗ i 2 ) ) , μ ⃗ = GCN μ ( X , A ) , log σ ⃗ = GCN σ ( X , A ) , GCN ( X , A ) = A ~ ReLU ( A ~ X W 0 ) W 1 , A ~ = D − 1 2 A D − 1 2 . (3.2)
\begin{aligned}
q(\vec{z}_i | X, A) &= \mathcal{N} \left( \vec{z}_i | \vec{\mu}_i, \text{diag}(\vec{\sigma}_i^2) \right) ,\\
\vec{\mu} &= \text{GCN}_{\mu}(X, A),\\
\log \vec{\sigma} &= \text{GCN}_{\sigma}(X, A),\\
\text{GCN}(X, A) &= \tilde{A} \text{ReLU}(\tilde{A} X W_0) W_1,\\
\tilde{A} &= D^{-\frac{1}{2}} A D^{-\frac{1}{2}}.
\end{aligned}
\tag{3.2}
q ( z i ∣ X , A ) μ log σ GCN ( X , A ) A ~ = N ( z i ∣ μ i , diag ( σ i 2 ) ) , = GCN μ ( X , A ) , = GCN σ ( X , A ) , = A ~ ReLU ( A ~ X W 0 ) W 1 , = D − 2 1 A D − 2 1 . ( 3 . 2 ) GCN μ ( X , A ) \text{GCN}_{\mu}(X, A) GCN μ ( X , A ) 和GCN σ ( X , A ) \text{GCN}_{\sigma}(X, A) GCN σ ( X , A ) 共享W 0 W_0 W 0 。
Generative model是根据编码Z Z Z 生成邻接矩阵,重构图。
P ( A ∣ Z ) = ∏ i = 1 N ∏ j = 1 N p ( A i j ∣ z ⃗ i , z ⃗ j ) . (3.3)
P(A|Z) = \prod_{i=1}^{N} \prod_{j=1}^{N} p(A_{ij}|\vec{z}_i, \vec{z}_j).
\tag{3.3}
P ( A ∣ Z ) = i = 1 ∏ N j = 1 ∏ N p ( A i j ∣ z i , z j ) . ( 3 . 3 )
其中:p ( A i j = 1 ∣ z ⃗ i , z ⃗ j ) = σ ( z ⃗ i T z ⃗ j ) . (3.4)
p(A_{ij} = 1|\vec{z}_i, \vec{z}_j) = \sigma (\vec{z}_i^T \vec{z}_j).
\tag{3.4}
p ( A i j = 1 ∣ z i , z j ) = σ ( z i T z j ) . ( 3 . 4 )
损失函数:L = E q ( Z ∣ X , A ) [ log p ( A ∣ Z ) ] − KL [ q ( Z ∣ X , A ) ∥ p ( Z ) ] . (3.5)
\mathcal{L} = \mathbb{E}_{q(Z | X, A)} \left[ \log p (A|Z) \right] - \text{KL} \left[ q(Z | X, A) \| p(Z) \right].
\tag{3.5}
L = E q ( Z ∣ X , A ) [ log p ( A ∣ Z ) ] − KL [ q ( Z ∣ X , A ) ∥ p ( Z ) ] . ( 3 . 5 )
原文还提出了Non-probabilistic GAE:A ^ = σ ( Z Z T ) , Z = GCN ( X , A ) . (3.6)
\begin{aligned}
\hat{A} &= \sigma(Z Z^T),\\
Z &= \text{GCN}(X, A).
\end{aligned}
\tag{3.6}
A ^ Z = σ ( Z Z T ) , = GCN ( X , A ) . ( 3 . 6 )
4 MGAE: Marginalized Graph Autoencoder for Graph Clustering
[Wang C, 2017, 4 ] 显示给出了一层卷积后的最佳参数W W W :∥ X − D ~ − 1 2 A ~ D ~ − 1 2 X W ∥ 2 + λ ∥ W ∥ F 2 . (4.1)
\| X - \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} X W \|^2 + \lambda\| W \|_F^2.
\tag{4.1}
∥ X − D − 2 1 A D − 2 1 X W ∥ 2 + λ ∥ W ∥ F 2 . ( 4 . 1 )
对输入的图随机选择保留m m m 个顶点,得到[ X ~ 1 , ⋯ , X ~ m ] [ \widetilde{X}_1, \cdots, \widetilde{X}_m ] [ X 1 , ⋯ , X m ] ,记X ˉ = 1 m ∑ i = 1 m X ~ i \bar{X} = \frac{1}{m} \sum_{i=1}^{m}\widetilde{X}_i X ˉ = m 1 ∑ i = 1 m X i ,则:W = X ˉ T ( D ~ − 1 2 A ~ D ~ − 1 2 ) T X ( b a r X T ( D ~ − 1 2 A ~ D ~ − 1 2 ) T ( D ~ − 1 2 A ~ D ~ − 1 2 ) X ) − 1 . (4.2)
W = \bar{X}^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right)^T X \left( bar{X}^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right)^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right) X \right)^{-1}.
\tag{4.2}
W = X ˉ T ( D − 2 1 A D − 2 1 ) T X ( b a r X T ( D − 2 1 A D − 2 1 ) T ( D − 2 1 A D − 2 1 ) X ) − 1 . ( 4 . 2 )
编码器为:Z = ( D ~ − 1 2 A ~ D ~ − 1 2 ) X W . (4.3)
Z = \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right) X W.
\tag{4.3}
Z = ( D − 2 1 A D − 2 1 ) X W . ( 4 . 3 )
5 Learning Deep Network Representations with Adversarially Regularized Autoencoders
[Yu W, 2018, 5 ] 在自动编码器上加入对抗生成网络。
这个网络需要先由随机游走作为网络的输入,然后训练编码器、解码器、生成器、判别器。训练过程如下图。
6 Deep Recursive Network Embedding with Regular Equivalence
[Tu K, 2018, 6 ] 提出了两种等价定义。
(Structural Equivalence) We denote s ( u ) = s ( v ) s(u) = s(v) s ( u ) = s ( v ) if nodes u u u and v v v are structurally equivalent. Then s ( u ) = s ( v ) s(u) = s(v) s ( u ) = s ( v ) if and only if N ( u ) = N ( v ) \mathcal{N}(u) = \mathcal{N}(v) N ( u ) = N ( v ) .
(Regular Equivalence) We denote r ( u ) = r ( v ) r (u) = r (v) r ( u ) = r ( v ) if nodes u u u and v v v are regularly equivalent. Then r ( u ) = r ( v ) r (u) = r (v) r ( u ) = r ( v ) if and only if { r ( i ) ∣ i ∈ N ( u ) } = { r ( j ) ∣ j ∈ N ( u ) } \{r (i)|i \in \mathcal{N}(u) \} = \{r (j)|j \in \mathcal{N}(u)\} { r ( i ) ∣ i ∈ N ( u ) } = { r ( j ) ∣ j ∈ N ( u ) } .
嵌入过程:
对一个顶点找出邻居顶点,将它们的特征向量按顶点的度排序;
输入到LSTM中;
LSTM输出便是顶点的重构。
7 Adversarially Regularized Graph Autoencoder for Graph Embedding
[Pan S, 2018, 7 ] 在[Kipf T, 2016, 3 ]的基础上更换了卷积操作,并且加入了对抗机制,但是与[Yu W, 2018, 5 ]不同,只有判别器没有生成器。
非概率编码器重构损失:L 0 = E q ( Z ∣ ( X , A ) ) [ log p ( A ^ ∣ Z ) ] . (7.1)
\mathcal{L}_0 = \mathbb{E}_{q(Z|(X,A))} [\log p(\hat{A}|Z)].
\tag{7.1}
L 0 = E q ( Z ∣ ( X , A ) ) [ log p ( A ^ ∣ Z ) ] . ( 7 . 1 )
变分编码器重构损失:L 1 = E q ( Z ∣ X , A ) [ log p ( A ^ ∣ Z ) − KL [ q ( Z ∣ X , A ) ∥ p ( Z ) ] ] . (7.2)
\mathcal{L}_1 = \mathbb{E}_{q(Z|X,A)} \left[ \log p(\hat{A}|Z) - \text{KL} \left[ q(Z | X, A) \| p(Z) \right] \right].
\tag{7.2}
L 1 = E q ( Z ∣ X , A ) [ log p ( A ^ ∣ Z ) − KL [ q ( Z ∣ X , A ) ∥ p ( Z ) ] ] . ( 7 . 2 )
对抗损失:min G max D E z ⃗ ∼ p z [ log D ( Z ) ] + E x ⃗ ∼ p ( x ⃗ ) [ log ( 1 − D ( G ( X , A ) ) ) ] . (7.3)
\min_{\mathcal{G}} \max_{\mathcal{D}} \mathbb{E}_{\vec{z} \sim p_z} \left[ \log \mathcal{D}(Z) \right] + \mathbb{E}_{\vec{x} \sim p(\vec{x})} \left[ \log \left( 1 - \mathcal{D}(\mathcal{G}(X,A)) \right) \right].
\tag{7.3}
G min D max E z ∼ p z [ log D ( Z ) ] + E x ∼ p ( x ) [ log ( 1 − D ( G ( X , A ) ) ) ] . ( 7 . 3 )
参考文献
1 Wang D, Cui P, Zhu W, et al. Structural Deep Network Embedding[C]. knowledge discovery and data mining, 2016: 1225-1234.
2 Cao S, Lu W, Xu Q, et al. Deep neural networks for learning graph representations[C]. national conference on artificial intelligence, 2016: 1145-1152.
3 Kipf T, Welling M. Variational Graph Auto-Encoders.[J]. arXiv: Machine Learning, 2016.
4 Wang C, Pan S, Long G, et al. MGAE: Marginalized Graph Autoencoder for Graph Clustering[C]. conference on information and knowledge management, 2017: 889-898.
5 Yu W, Zheng C, Cheng W, et al. Learning Deep Network Representations with Adversarially Regularized Autoencoders[C]. knowledge discovery and data mining, 2018: 2663-2671.
6 Tu K, Cui P, Wang X, et al. Deep Recursive Network Embedding with Regular Equivalence[C]. knowledge discovery and data mining, 2018: 2357-2366.
7 Pan S, Hu R, Long G, et al. Adversarially Regularized Graph Autoencoder for Graph Embedding[C]. international joint conference on artificial intelligence, 2018: 2609-2615.