Recurrent Graph Neural Networks

递归图神经网络（RecGNN）大多是图神经网络的开创性作品。 RecGNN旨在学习具有递归神经体系结构的节点表示。他们假设图中的节点不断与其邻居交换信息/消息，直到达到稳定的平衡。 RecGNNs在概念上很重要，并启发了后来对卷积图神经网络的研究。特别地，消息传递的思想被基于空间的卷积图神经网络所继承。[1]

1 A New Model for earning in raph Domains

[Marco Gori, 2005, 2] 最早提出了GCN的概念。

记 $n$ 为图上一顶点， $\vec{x}_n$ 是顶点 $n$ 的状态（state）， $\vec{l}_n$ 是顶点 $n$ 的标签。相应的， $\vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]}$ 是是顶点 $n$ 的邻居顶点的状态和标签。

transition function :
$\vec{x}_n = f_w \left( \vec{l}_n, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right), \qquad n \in \mathcal{N} \tag{1.1}$
output function :
$\vec{o}_n = g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \tag{1.2}$

递归图神经网络

将式（1.1）替换成下式：
$\vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{1.3}$
其中 $h_w$ 可以用显式线性函数或者神经网络。

Linear GNN：
$\begin{aligned} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right) &= A_{n,u} \vec{x}_u + b_n \\ A_{n,u} &= \frac{\mu}{s \cdot |\text{ne}[u]|} \cdot Resize\left( \phi_w (\vec{l}_n, \vec{l}_u) \right) \\ \vec{b}_n &= \rho_w (\vec{l}_n)\\ \phi_w &: \reals^{2q \times 1} \rightarrow \reals^{s^2 \times 1} \\ \rho_w &: \reals^{q} \rightarrow \reals^{s} \\ \mu & \in (0, 1) \tag{1.4} \end{aligned}$
Neural GNN：
$h_w$ 使用神经网络。

2 The Graph Neural Network Model

[Franco Scarselli, 2009, 3] 与 [Marco Gori, 2005, 2]相比多了边上的信息 $\vec{l}_{\text{co}[n]}$ 。

$\begin{aligned} \vec{x}_n &= f_w \left( \vec{l}_n, \vec{l}_{\text{co}[n]}, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right) \\ \vec{o}_n &= g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \end{aligned} \tag{2.1}$

递归图神经网络

相应的，
$\vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{l}_{(n,u)}, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{2.2}$

在训练上，先循环式（2.2）直到 $\|\vec{x}_n(t) - \vec{x}_n(t-1)\| \leq \epsilon$ ，即达到稳定点，然后再经行BP反向传播，更新参数，接着继续循环式（2.2）。

3 Graph Echo State Networks

[Claudio Gallicchio, 2010, 4] 将transition function分成了：

local state transition function :
$\begin{aligned} x_t(v) &= \tau \left( \vec{u}(v), x_{t-1}\left( \mathcal{N}(v) \right)\right) \\ &= f \left( W_{\text{in}} \vec{u}(v), \hat{W}_{\mathcal{N}} x_{t-1}\left( \mathcal{N}(v) \right) \right) \end{aligned} \tag{3.1}$
global state transition function :
$\begin{aligned} x_t(g) &= \hat{\tau} \left( g, x_{t-1}(g) \right) \\ &= \begin{pmatrix} f \left( W_{\text{in}} \vec{u}(v_1) + \hat{W}_{v_1} x_{t-1}(g) \right) \\ \vdots \\ f \left( W_{\text{in}} \vec{u}(v_{|\mathcal{V}|}) + \hat{W}_{v_{|\mathcal{V}|}} x_{t-1}(g) \right) \end{pmatrix}. \end{aligned} \tag{3.2}$

output function根据任务不同选择的函数也不一样:

structure-to-structure：

$\vec{y}(v) = g_{\text{out}}(\vec{x}(v)) = W_{\text{out}} \vec{x}(v). \tag{3.3}$

structure-to-element:

$\vec{y}(v) = g_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right) = W_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right). \tag{3.4}$

4 Gated Graph Sequence Neural Networks

[Yujia Li, 2015, 5] 在[Franco Scarselli, 2009, 3]基础上，将 $\vec{l}_{\text{co}[n]}$ 分成出边和入边：

$\begin{aligned} \vec{h}_{v}^{(t)} &= f^{*}\left( \vec{l}_v, \vec{l}_{\text{co}[v]}, \vec{l}_{\text{ne}[v]}, \vec{h}_{\text{ne}[v]}^{(t-1)} \right) \\ &= \sum_{v^{'} \in \text{IN}[v]} f \left( \vec{l}_v, \vec{l}_{(v^{'}, v)}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) +\sum_{v^{'} \in \text{OUT}[v]} f \left( \vec{l}_v, \vec{l}_{(v, v^{'})}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) \end{aligned} \tag{4.1}$

$\vec{h}_v$ 使用GRU单元：
$\begin{aligned} \vec{h}_v^{(1)} &= \left[\vec{x}_v^T, \vec{0} \right]^T \\ A &= \left[A^{\text{(out)}}, A^{\text{(in)}} \right] \\ \vec{a}_v^{(t)} &= A_{v:}^T\left [\vec{h}_{1}^{(t-1)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(t-1)} \right]^T + \vec{b} \\ \vec{z}_v^{(t)} &= \sigma \left( W^{z} \vec{a}_v^{(t)} + U^{z} \vec{h}_v^{(t-1)} \right) \\ \vec{r}_v^{(t)} &= \sigma \left( W^{r} \vec{a}_v^{(t)} + U^{r} \vec{h}_v^{(t-1)} \right) \\ \widetilde{\vec{h}_v^{(t)}} &= \tanh \left( W \vec{a}_v^{(t)} + U \left( \vec{r}_v^{(t)} \odot \vec{h}_v^{(t-1)} \right) \right) \\ \vec{h}_v^{(t)} &= \left( 1 - \vec{z}_v^{(t)} \right) \odot \vec{h}_v^{(t-1)} + \vec{z}_v^{(t)} \odot \widetilde{\vec{h}_v^{(t)}}. \end{aligned} \tag{4.2}$

output function :
$\vec{h}_{\mathcal{G}} = \tanh \left( \sum_{v \in \mathcal{V}} \sigma \left( i\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right) \odot \tanh j\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right)$
其中 $i(.),j(.)$ 都是神经网络，以 $[\vec{h}_v^{(T)}, \vec{x}_v]$ 做输入。

输出序列 $\vec{o}^{(1)}, \cdots, \vec{o}^{(K)}$ ，对于第 $k$ 个输出，记 $\mathcal{X}^{(k)} = \left[ \vec{x}_{1}^{(k)}, \cdots, \vec{x}_{|\mathcal{V}|}^{(k)} \right] \in \reals^{|\mathcal{V}| \times L_{\mathcal{V}}}$ ，在第 $t$ 步为 $\mathcal{H}^{(k,t)} = \left[ \vec{h}_{1}^{(k,t)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(k,t)} \right] \in \reals^{|\mathcal{V}| \times D }$ 。结构如下图：

递归图神经网络

在使用 $\mathcal{H}^{(k,T)}$ 预测 $\mathcal{X}^{k+1)}$ 时，我们向模型当中引入了节点标注。每个节点的预测都是相互独立的，使用神经网络 $j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right)$ 来完成：
$\vec{x}_v^{(k+1)} = \sigma \left( j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right) \right)$

训练上也与[Franco Scarselli, 2009, 3]不同，而是采用随时间步逐步BP。

5 Learning Steady-States of Iterative Algorithms over Graphs

[Hanjun Dai, 2018, 6] 同样是采用了不动点的原理，中间迭代过程为：
$\begin{aligned} \vec{h}_v^{(0)} &\leftarrow \text{constant} , &\forall v \in \mathcal{V} \\ \vec{h}_v^{(t+1)} &\leftarrow \mathcal{T} \left( \{ \vec{h}_u^{(t)} \}_{u \in \mathcal{N}(v) } \right), &\forall t \geq 1. \end{aligned} \tag{5.1}$
稳定状态为：
$\vec{h}_v^{*} = \mathcal{T} \left( \{ \vec{h}_u^{*} \}_{u \in \mathcal{N}(v) } \right), \quad \forall v \in \mathcal{V}. \tag{5.2}$

式（5.1）的 $\mathcal{T}$ 其实就是transition function。

transition function :
$\mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right] = W_1 \sigma \left( W_2 \left[ \vec{x}_v, \sum_{u \in \mathcal{N}(v)} \left[ \widehat{h}_u , \vec{x}_u \right] \right] \right). \tag{5.3}$
output function :
$g(\widehat{h}_v) = \sigma \left( V_2^T \text{ReLU} \left( V_1^T \widehat{h}_v \right) \right). \tag{5.4}$

在训练过程中，使用了采样的操作，取 $\widetilde{\mathcal{V}} = \{ v_1, \cdots, v_N \} \in \mathcal{V}$ ，则 $\hat{h}_{v_i}$ 的更新过程为：
$\hat{h}_{v_i} \leftarrow (1 - \alpha)\hat{h}_{v_i} + \alpha \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right], \quad \forall v_i \in \widetilde{\mathcal{V}}. \tag{5.5}$

递归图神经网络

递归图神经网络

Recurrent Graph Neural Networks

1 A New Model for earning in raph Domains

2 The Graph Neural Network Model

3 Graph Echo State Networks

4 Gated Graph Sequence Neural Networks

5 Learning Steady-States of Iterative Algorithms over Graphs

参考文献

相关推荐