机器学习反向传播梯度求导

机器学习反向传播梯度推导

在我的前一篇文章中,已经推导出了单层感知机梯度的计算公式
多层感知机梯度推导
机器学习反向传播梯度求导
φEφWjk=(0ktk)0k(10k)Wj0 \frac {\varphi_E} {\varphi_{W_{j_k}}} = (0_k - t_k)0_k(1 - 0_k) W_j^0
1. 链式法则:
φf(x)φg(x)=φf(x)φh(x)φh(x)φg(x) \frac {\varphi f(x)} {\varphi g(x)} = \frac {\varphi f(x)} {\varphi h(x)} \frac {\varphi h(x)} {\varphi g(x)}
机器学习反向传播梯度求导
φEφWjk1=φEφWjk2φWjk2φWjk1 \frac {\varphi_E} {\varphi_{W_{j_k}^1}} = \frac {\varphi_E} {\varphi_{W_{j_k}^2}} \frac {\varphi_{W_{j_k}^2}} {\varphi_{W_{j_k}^1}}

2. bpnn推导:
机器学习反向传播梯度求导
:Σ,σOjJ=σ(xjJ)注: \Sigma, \sigma为**函数,同时O_j^J = \sigma(x_j^J)

φEφWjkK=(OkKtk)OkK(1OkK)OjJ所以:\frac {\varphi_E} {\varphi_{W_{j_k}^K}} = (O_k^K - t_k)O_k^K(1 - O_k^K) O_j^J
φEφWjkK=(OkKtk)δkOjJ \frac {\varphi_E} {\varphi_{W_{j_k}^K}} = (O_k^K - t_k)\delta_k O_j^J
φEφWijJ, 那么现在的关键就是求出\frac {\varphi_E} {\varphi_{W_{i_j}^J}}以及找出下一层\\权值梯度与上一层权值梯度的关系,依次迭代
φEφWijJ=φ12Σi=0m(OkKtk)2φWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = \frac {\varphi{{\frac 1 2}\Sigma_{i=0}^m(O_k^K - t_k)^2}} {\varphi_{W_{i_j}^J}}
WijJOkK, 对 {W_{i_j}^J} 的导数有影响的只有O_k^K,所以:
φEφWijJ=φ12(OkKtk)2φWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = \frac {\varphi{{\frac 1 2}(O_k^K - t_k)^2}} {\varphi_{W_{i_j}^J}}
φEφWijJ=(OkKtk)φOkKφWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k){\frac {\varphi{O_k^K }} {\varphi_{W_{i_j}^J}}}
φEφWijJ=(OkKtk)φσ(xkK)φWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k) \frac {\varphi{\sigma(x_k^K)}} {\varphi_{W_{i_j}^J}}
φEφWijJ=(OkKtk)OkK(1OkK)φxkKφWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k)O_k^K(1 - O_k^K) \frac {\varphi_{x_k^K}} {\varphi_{W_{i_j}^J}}
使φEφWijJ=(OkKtk)OkK(1OkK)φxkKφOjJφOjJφWijJ 使用链式法则: \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k)O_k^K(1 - O_k^K) \frac {\varphi_{x_k^K}} {\varphi_{O_j^J}} \frac {\varphi_{O_j^J}} {\varphi_{W_{i_j}^J}}
φEφWijJ=(OkKtk)δkKWjkφOjJφWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k)\delta_k^K W_{j_k}\frac {\varphi_{O_j^J}} {\varphi_{W_{i_j}^J}}
φEφWijJ=(OkKtk)δkKWjkφσ(xjJ)φWijJ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} = (O_k^K - t_k) \delta_k^K W_{j_k} \frac {\varphi_{\sigma(x_j^J)}} {\varphi_{W_{i_j}^J}}
φEφWijJ=(OkKtk)δkKWjkδjJxj0 类似于上一层推导:\\ \frac {\varphi_E} {\varphi_{W_{i_j}^J}} =(O_k^K - t_k) \delta_k^K W_{j_k} \delta_j^J x_j^0
神经网络计算过程:

  1. 通过前向传播计算出训练结果
  2. 将训练结果通过反向传播作用于梯度下降