递归图神经网络

Recurrent Graph Neural Networks

递归图神经网络(RecGNN)大多是图神经网络的开创性作品。 RecGNN旨在学习具有递归神经体系结构的节点表示。 他们假设图中的节点不断与其邻居交换信息/消息,直到达到稳定的平衡。 RecGNNs在概念上很重要,并启发了后来对卷积图神经网络的研究。 特别地,消息传递的思想被基于空间的卷积图神经网络所继承。[1]

1 A New Model for earning in raph Domains

[Marco Gori, 2005, 2] 最早提出了GCN的概念。

nn为图上一顶点,xn\vec{x}_n是顶点nn的状态(state),ln\vec{l}_n是顶点nn的标签。相应的,xne[n],lne[n]\vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]}是是顶点nn的邻居顶点的状态和标签。

  • transition function :
    xn=fw(ln,xne[n],lne[n]),nN(1.1) \vec{x}_n = f_w \left( \vec{l}_n, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right), \qquad n \in \mathcal{N} \tag{1.1}

  • output function :
    on=gw(xn,ln),nN(1.2) \vec{o}_n = g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \tag{1.2}

递归图神经网络

将式(1.1)替换成下式:
xn=uNhw(ln,xu,lu),nN(1.3) \vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{1.3}
其中hwh_w可以用显式线性函数或者神经网络。

  • Linear GNN:
    hw(ln,xu,lu)=An,uxu+bnAn,u=μsne[u]Resize(ϕw(ln,lu))bn=ρw(ln)ϕw:R2q×1Rs2×1ρw:RqRsμ(0,1)(1.4) \begin{aligned} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right) &= A_{n,u} \vec{x}_u + b_n \\ A_{n,u} &= \frac{\mu}{s \cdot |\text{ne}[u]|} \cdot Resize\left( \phi_w (\vec{l}_n, \vec{l}_u) \right) \\ \vec{b}_n &= \rho_w (\vec{l}_n)\\ \phi_w &: \reals^{2q \times 1} \rightarrow \reals^{s^2 \times 1} \\ \rho_w &: \reals^{q} \rightarrow \reals^{s} \\ \mu & \in (0, 1) \tag{1.4} \end{aligned}

  • Neural GNN:
    hwh_w使用神经网络。

2 The Graph Neural Network Model

[Franco Scarselli, 2009, 3] 与 [Marco Gori, 2005, 2]相比多了边上的信息lco[n]\vec{l}_{\text{co}[n]}

xn=fw(ln,lco[n],xne[n],lne[n])on=gw(xn,ln),nN(2.1) \begin{aligned} \vec{x}_n &= f_w \left( \vec{l}_n, \vec{l}_{\text{co}[n]}, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right) \\ \vec{o}_n &= g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \end{aligned} \tag{2.1}

递归图神经网络

相应的,
xn=uNhw(ln,l(n,u),xu,lu),nN(2.2) \vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{l}_{(n,u)}, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{2.2}

在训练上,先循环式(2.2)直到xn(t)xn(t1)ϵ\|\vec{x}_n(t) - \vec{x}_n(t-1)\| \leq \epsilon,即达到稳定点,然后再经行BP反向传播,更新参数,接着继续循环式(2.2)。

3 Graph Echo State Networks

[Claudio Gallicchio, 2010, 4] 将transition function分成了:

  • local state transition function :
    xt(v)=τ(u(v),xt1(N(v)))=f(Winu(v),W^Nxt1(N(v)))(3.1) \begin{aligned} x_t(v) &= \tau \left( \vec{u}(v), x_{t-1}\left( \mathcal{N}(v) \right)\right) \\ &= f \left( W_{\text{in}} \vec{u}(v), \hat{W}_{\mathcal{N}} x_{t-1}\left( \mathcal{N}(v) \right) \right) \end{aligned} \tag{3.1}

  • global state transition function :
    xt(g)=τ^(g,xt1(g))=(f(Winu(v1)+W^v1xt1(g))f(Winu(vV)+W^vVxt1(g))).(3.2) \begin{aligned} x_t(g) &= \hat{\tau} \left( g, x_{t-1}(g) \right) \\ &= \begin{pmatrix} f \left( W_{\text{in}} \vec{u}(v_1) + \hat{W}_{v_1} x_{t-1}(g) \right) \\ \vdots \\ f \left( W_{\text{in}} \vec{u}(v_{|\mathcal{V}|}) + \hat{W}_{v_{|\mathcal{V}|}} x_{t-1}(g) \right) \end{pmatrix}. \end{aligned} \tag{3.2}

output function根据任务不同选择的函数也不一样:

  • structure-to-structure:

y(v)=gout(x(v))=Woutx(v).(3.3) \vec{y}(v) = g_{\text{out}}(\vec{x}(v)) = W_{\text{out}} \vec{x}(v). \tag{3.3}

  • structure-to-element:

y(v)=gout(1VvVx(v))=Wout(1VvVx(v)).(3.4) \vec{y}(v) = g_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right) = W_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right). \tag{3.4}

4 Gated Graph Sequence Neural Networks

[Yujia Li, 2015, 5] 在[Franco Scarselli, 2009, 3]基础上,将lco[n]\vec{l}_{\text{co}[n]}分成出边和入边:

hv(t)=f(lv,lco[v],lne[v],hne[v](t1))=vIN[v]f(lv,l(v,v),lv,hv(t1))+vOUT[v]f(lv,l(v,v),lv,hv(t1))(4.1) \begin{aligned} \vec{h}_{v}^{(t)} &= f^{*}\left( \vec{l}_v, \vec{l}_{\text{co}[v]}, \vec{l}_{\text{ne}[v]}, \vec{h}_{\text{ne}[v]}^{(t-1)} \right) \\ &= \sum_{v^{'} \in \text{IN}[v]} f \left( \vec{l}_v, \vec{l}_{(v^{'}, v)}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) +\sum_{v^{'} \in \text{OUT}[v]} f \left( \vec{l}_v, \vec{l}_{(v, v^{'})}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) \end{aligned} \tag{4.1}

hv\vec{h}_v使用GRU单元:
hv(1)=[xvT,0]TA=[A(out),A(in)]av(t)=Av:T[h1(t1),,hV(t1)]T+bzv(t)=σ(Wzav(t)+Uzhv(t1))rv(t)=σ(Wrav(t)+Urhv(t1))hv(t)~=tanh(Wav(t)+U(rv(t)hv(t1)))hv(t)=(1zv(t))hv(t1)+zv(t)hv(t)~.(4.2) \begin{aligned} \vec{h}_v^{(1)} &= \left[\vec{x}_v^T, \vec{0} \right]^T \\ A &= \left[A^{\text{(out)}}, A^{\text{(in)}} \right] \\ \vec{a}_v^{(t)} &= A_{v:}^T\left [\vec{h}_{1}^{(t-1)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(t-1)} \right]^T + \vec{b} \\ \vec{z}_v^{(t)} &= \sigma \left( W^{z} \vec{a}_v^{(t)} + U^{z} \vec{h}_v^{(t-1)} \right) \\ \vec{r}_v^{(t)} &= \sigma \left( W^{r} \vec{a}_v^{(t)} + U^{r} \vec{h}_v^{(t-1)} \right) \\ \widetilde{\vec{h}_v^{(t)}} &= \tanh \left( W \vec{a}_v^{(t)} + U \left( \vec{r}_v^{(t)} \odot \vec{h}_v^{(t-1)} \right) \right) \\ \vec{h}_v^{(t)} &= \left( 1 - \vec{z}_v^{(t)} \right) \odot \vec{h}_v^{(t-1)} + \vec{z}_v^{(t)} \odot \widetilde{\vec{h}_v^{(t)}}. \end{aligned} \tag{4.2}

output function :
hG=tanh(vVσ(i([hv(T),xv]))tanhj([hv(T),xv])) \vec{h}_{\mathcal{G}} = \tanh \left( \sum_{v \in \mathcal{V}} \sigma \left( i\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right) \odot \tanh j\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right)
其中i(.),j(.)i(.),j(.)都是神经网络,以[hv(T),xv][\vec{h}_v^{(T)}, \vec{x}_v]做输入。

输出序列o(1),,o(K)\vec{o}^{(1)}, \cdots, \vec{o}^{(K)},对于第kk个输出,记X(k)=[x1(k),,xV(k)]RV×LV\mathcal{X}^{(k)} = \left[ \vec{x}_{1}^{(k)}, \cdots, \vec{x}_{|\mathcal{V}|}^{(k)} \right] \in \reals^{|\mathcal{V}| \times L_{\mathcal{V}}},在第tt步为H(k,t)=[h1(k,t),,hV(k,t)]RV×D\mathcal{H}^{(k,t)} = \left[ \vec{h}_{1}^{(k,t)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(k,t)} \right] \in \reals^{|\mathcal{V}| \times D }。结构如下图:

递归图神经网络

在使用H(k,T)\mathcal{H}^{(k,T)}预测Xk+1)\mathcal{X}^{k+1)}时,我们向模型当中引入了节点标注。每个节点的预测都是相互独立的,使用神经网络j([hv(k,T),xv(k)])j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right)来完成:
xv(k+1)=σ(j([hv(k,T),xv(k)])) \vec{x}_v^{(k+1)} = \sigma \left( j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right) \right)

训练上也与[Franco Scarselli, 2009, 3]不同,而是采用随时间步逐步BP。

5 Learning Steady-States of Iterative Algorithms over Graphs

[Hanjun Dai, 2018, 6] 同样是采用了不动点的原理,中间迭代过程为:
hv(0)constant,vVhv(t+1)T({hu(t)}uN(v)),t1.(5.1) \begin{aligned} \vec{h}_v^{(0)} &\leftarrow \text{constant} , &\forall v \in \mathcal{V} \\ \vec{h}_v^{(t+1)} &\leftarrow \mathcal{T} \left( \{ \vec{h}_u^{(t)} \}_{u \in \mathcal{N}(v) } \right), &\forall t \geq 1. \end{aligned} \tag{5.1}
稳定状态为:
hv=T({hu}uN(v)),vV.(5.2) \vec{h}_v^{*} = \mathcal{T} \left( \{ \vec{h}_u^{*} \}_{u \in \mathcal{N}(v) } \right), \quad \forall v \in \mathcal{V}. \tag{5.2}

式(5.1)的T\mathcal{T}其实就是transition function。

  • transition function :
    TΘ[{hu^}uN(v)]=W1σ(W2[xv,uN(v)[h^u,xu]]).(5.3) \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right] = W_1 \sigma \left( W_2 \left[ \vec{x}_v, \sum_{u \in \mathcal{N}(v)} \left[ \widehat{h}_u , \vec{x}_u \right] \right] \right). \tag{5.3}

  • output function :
    g(h^v)=σ(V2TReLU(V1Th^v)).(5.4) g(\widehat{h}_v) = \sigma \left( V_2^T \text{ReLU} \left( V_1^T \widehat{h}_v \right) \right). \tag{5.4}

在训练过程中,使用了采样的操作,取V~={v1,,vN}V\widetilde{\mathcal{V}} = \{ v_1, \cdots, v_N \} \in \mathcal{V},则h^vi\hat{h}_{v_i}的更新过程为:
h^vi(1α)h^vi+αTΘ[{hu^}uN(v)],viV~.(5.5) \hat{h}_{v_i} \leftarrow (1 - \alpha)\hat{h}_{v_i} + \alpha \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right], \quad \forall v_i \in \widetilde{\mathcal{V}}. \tag{5.5}

递归图神经网络

参考文献