【深度学习入门】NNDL学习笔记(二)

 

Chapter2 How the backpropagation algorithm works?

反向传播 Backpropagation: a fast algorithm for computing gradients.

Warm up: a fast matrix-based approach to computing the output from a neural network

                                                      【深度学习入门】NNDL学习笔记(二)

weighted input: 【深度学习入门】NNDL学习笔记(二)

elementwise multiplication: 【深度学习入门】NNDL学习笔记(二)

The two assumptions we need about the cost function

1. The cost function can be written as an average 【深度学习入门】NNDL学习笔记(二)over cost functions Cx for individual training examples, x.

2. The cost can be written as a function of the outputs from the neural network 【深度学习入门】NNDL学习笔记(二)

The four fundamental equations behind backpropagation

The error  in the jth neuron in the lth layer: 【深度学习入门】NNDL学习笔记(二)

For all σ(.): 

Equation1: the error in the output layer 【深度学习入门】NNDL学习笔记(二)


                                                                     【深度学习入门】NNDL学习笔记(二),

                                                            matrix form: 【深度学习入门】NNDL学习笔记(二)【深度学习入门】NNDL学习笔记(二)


It is easily computed: 

因为【深度学习入门】NNDL学习笔记(二), 计算出z 就可计算出 【深度学习入门】NNDL学习笔记(二)

If we're using the quadratic cost function then 【深度学习入门】NNDL学习笔记(二),   【深度学习入门】NNDL学习笔记(二)

matrix form 【深度学习入门】NNDL学习笔记(二)

Equation2: the error 【深度学习入门】NNDL学习笔记(二) in terms of the next layer, 【深度学习入门】NNDL学习笔记(二)


                                                                 【深度学习入门】NNDL学习笔记(二),

                                                                      【深度学习入门】NNDL学习笔记(二)


Equation3: An equation for the rate of change of the cost with respect to any bias in the network.


                                                                 【深度学习入门】NNDL学习笔记(二)


Equation4: An equation for the rate of change of the cost with respect to any weight in the network


                                                          【深度学习入门】NNDL学习笔记(二),     【深度学习入门】NNDL学习笔记(二)


A weight or bias in the final layer will learn slowly if the output neuron is either low activation (≈0) or high activation (≈1). 

If the input neuron is low-activation, or if the output neuron has saturated, weights and biases will learn very slowly.

【深度学习入门】NNDL学习笔记(二)

The backpropagation algorithm

【深度学习入门】NNDL学习笔记(二)

Exercise

1.Suppose we modify a single neuron in a feedforward network so that the output from the neuron is given by f(∑jwjxj+b). How should we modify the backpropagation algorithm in this case?

For BP1 & BP2 just exchange σ for f while BP3 & BP4 will not change.

2.Linear: Suppose we replace the usual non-linear σ function with σ(z)=z throughout the network. Rewrite the backpropagation algorithm for this case.

Change every σ to f(z) = x.

The code for backpropagation

For a mini-batch of size m:

【深度学习入门】NNDL学习笔记(二)

Fully matrix-based approach: It's possible to modify the backpropagation algorithm so that it computes the gradients for all training examples in a mini-batch simultaneously,taking full advantage of modern libraries for linear algebra

The big picture

                                                                     【深度学习入门】NNDL学习笔记(二)

 

This suggests that a possible approach to computing ∂C/∂wljk is to carefully track how a small change in wljk propagates to cause a small change in C.

                                                                    【深度学习入门】NNDL学习笔记(二)

For a single neuron in the next layer:

                                                                   【深度学习入门】NNDL学习笔记(二)

Imagine a path from w to C:

【深度学习入门】NNDL学习笔记(二)

There are many paths from w to C:

【深度学习入门】NNDL学习笔记(二)

What the equation tells us is that every edge between two neurons in the network is associated with a rate factor which is just the partial derivative of one neuron's activation with respect to the other neuron's activation.