Neural Networks: Learning: Backpropagation intuition

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十章《神经网络参数的反向传播算法》中第74课时《理解反向传播算法》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————
In the previous video, we talked about the back propagation algorithm. To a lot of people seen for the first time, the first impression is often that wow, this is a very complicated algorithm and there are all these different steps. And I’m not quite sure how they fit together and it’s kind of like a black box with all these complicated steps. In case that’s how you are feeling about back propagation, that’s actually okay. Back propagation may be unfortunately is a less mathematically clean or less mathematically simple algorithm compared to linear regression or logistic regression, and I’ve actually used back propagation pretty successful for many years and even today, I still don’t sometimes feel like I have a very good sense of just what it’s doing most of intuition about what back propagation is doing. For those of you that are doing the programming exercises that will at least mechanically step you through the different steps of how to implement back propagation so you will be able to get it work for yourself. And what I want to do in this video is look a little bit more at the mechanical steps of back propagation and try to give you a little more intuition about what the mechanical steps of back propagation is doing to hopefully convince you that it is at least a reasonable algorithm. In case even after this video, in case back propagation still seems very black box and kind of like, too many complicated steps, a little bit magical to you, that’s actually okay. And even though, I have used back prop for many years, sometimes it’s a difficult algorithm to understand. But hopefully this video will help a little bit.

Neural Networks: Learning: Backpropagation intuition

In order to better understand back propagation, let’s take another close look at what forward propagation is doing. Here is a neural network with 2 input units that is not counting the bias unit, and two hidden units in this layer and 2 hidden units in the next layer, and then finally one output unit. And again, these counts 2, 2, 2 are not counting these bias units on top. In order to illustrate forward propagation, I’m going to draw this network a little bit differently.

Neural Networks: Learning: Backpropagation intuition

And in particular, I’m going to draw this neural network with the nodes drawn as these very fat ellipses, so that I can write text in them. When perform forward propagation, we might have some particular example, say some example Neural Networks: Learning: Backpropagation intuition and it will be this Neural Networks: Learning: Backpropagation intuition that we feed into the input layer, so this maybe Neural Networks: Learning: Backpropagation intuition and Neural Networks: Learning: Backpropagation intuition are the values we set the input layer to and when we forward propagate it to the first hidden layer here, what we do is compute Neural Networks: Learning: Backpropagation intuition and Neural Networks: Learning: Backpropagation intuition, so these are the weighted sum of inputs of the input units, and then we applied the sigmoid of the logistic function and the sigmoid activation function applied to the z value, gives us these activation values. So that gives us Neural Networks: Learning: Backpropagation intuition and Neural Networks: Learning: Backpropagation intuition, and then we forward propagate again to get, here Neural Networks: Learning: Backpropagation intuition, apply the sigmoid of the logistic function, the activation function to that, to get Neural Networks: Learning: Backpropagation intuition and similarly like so, until we get Neural Networks: Learning: Backpropagation intuition, apply the activation function that gives us Neural Networks: Learning: Backpropagation intuition which is the final output value of the neural network. Let’s erase this arrow to give myself some space, and if you look at what this computation really is doing, focusing on this hidden unit let’s say, we have that this weight, shown in magenta there, is my weight Neural Networks: Learning: Backpropagation intuition. The index is not important, and this way here which I guess I’m highlight in red, that is Neural Networks: Learning: Backpropagation intuition. And this weight here, which I’m drawing in green, in a cyan, is Neural Networks: Learning: Backpropagation intuition. So the way we computer value Neural Networks: Learning: Backpropagation intuition is Neural Networks: Learning: Backpropagation intuition. And so that’s forward propagation. And it turns out that, as we see later on in this video, what back propagation is doing, is doing a process very similar to this except that instead of the computations flowing from the left to the right of this network, the computation is there flow from the right to the left of the network, and using a very similar computation as this, and I’ll say in two slides exactly what I mean by that.

Neural Networks: Learning: Backpropagation intuition

To better understand what back propagation is doing, let’s look at the cost function, here’s the cost function that we had for when we have only one output unit. If we have more than one output unit, we just have a summation over the output units indexed by k there. But if only one output unit then this is a cost function. And if we do forward propagation and back propagation on one example at a time. So, let’s just focus on the single example Neural Networks: Learning: Backpropagation intuition, and focus on the case of having one output unit, so Neural Networks: Learning: Backpropagation intuition here is just a real number, and let’s ignore regularization, so Neural Networks: Learning: Backpropagation intuition. And this final term that regularization term goes away. Now, if you look inside this summation, you find that this cost term associated with the Neural Networks: Learning: Backpropagation intuition training example, that is, the cost associated with the training example Neural Networks: Learning: Backpropagation intuition, that’s going to be given by this expression, that the cost of training example i is written as follows. And what this cost function does is it plays a role similar to the square error. So, rather than looking at this complicated expression, if you want you can think of cost(i) being approximately, the square difference between what the neural network outputs versus what is the actual value. Just as in logistic regression, we actually prefer to use this slightly more complicated cost function using the log, but for the purpose of intuition, feel free to think of the cost function as being sort of the squared error cost function, and so this cost(i) measures how well is the network doing on correctly predicting example i. How close is the output to the actually observed label Neural Networks: Learning: Backpropagation intuition.

Neural Networks: Learning: Backpropagation intuition

Now let’s look at what back propagation is doing. One useful intuition is that back propagation is computing these Neural Networks: Learning: Backpropagation intuition items, and we can think of these as the call error of the activation value that we got for unit Neural Networks: Learning: Backpropagation intuition in layer Neural Networks: Learning: Backpropagation intuition. More formally, and this is maybe only for those of you that are familiar with calculus, what the Neural Networks: Learning: Backpropagation intuition terms actually are is this: they are the partial derivative with respect to Neural Networks: Learning: Backpropagation intuition, that is the weighted sum of inputs that we’re computing the z terms, partial derivative respect of these things of the cost function. So concretely the cost function is a function of the label y, and of the value, this Neural Networks: Learning: Backpropagation intuition output value of neural network. And if we could go inside the neural network and just change those Neural Networks: Learning: Backpropagation intuition a little bit, then that would affect these values that the neural network outputting. And so that will end up changing the cost function. And again really this is only for those of you expert in calculus if you’re familiar with partial derivation. What these Neural Networks: Learning: Backpropagation intuition terms are, it turn out to be the partial derivative of the cost function with respect to these intermediate terms that we’re computing. And so they’re measure of how much we would like to change the neural network’s weights in order to affect intermediate values of the computation, so as to affect the final output of the neural network Neural Networks: Learning: Backpropagation intuition and therefore affect the overall cost. In case this last part of this partial derivative intuition, in case that didn’t make sense, don’t worry about it, the rest of this we can do without really talking about partial derivatives. But let’s look in more detail of what back propagation is doing. For the output layer, if first sets this Neural Networks: Learning: Backpropagation intuition term, we say Neural Networks: Learning: Backpropagation intuition if we’re doing forward propagation and back propagation on this training example i. So this really is the error, the difference between the actual value of y minus what was the value predicted. So we’re going to compute Neural Networks: Learning: Backpropagation intuition like so. Next we’re going to do propagate these values backwards. I explain this in a second, and end up computing the Neural Networks: Learning: Backpropagation intuition terms of the previous layer. I’m going to end up with Neural Networks: Learning: Backpropagation intuition and Neural Networks: Learning: Backpropagation intuition and then we’re going to propagate this further backward and end up computing Neural Networks: Learning: Backpropagation intuition and Neural Networks: Learning: Backpropagation intuition. Now the back propagation calculation is a lot like running the forward propagation algorithm, but doing it backwards. So here’s what I mean. Let’s look at how we end up with this value of Neural Networks: Learning: Backpropagation intuition. So we have Neural Networks: Learning: Backpropagation intuition. And similar to forward propagation, let me label a couple of weights. So this weight should be drawn cyan and let’s say that weight is Neural Networks: Learning: Backpropagation intuition, and this weight down here, let me highlight this in red. That’s going to be, let’s say Neural Networks: Learning: Backpropagation intuition. So if we look at how Neural Networks: Learning: Backpropagation intuition is computed, how it’s computed for this node. It turns out that what we’re going to do is we’re going to take this value Neural Networks: Learning: Backpropagation intuition and multiply it by this weight Neural Networks: Learning: Backpropagation intuition, and add it to this value Neural Networks: Learning: Backpropagation intuition multiplied by that weight  Neural Networks: Learning: Backpropagation intuition. So it’s really a weighted sum of these Neural Networks: Learning: Backpropagation intuition values weighted by the corresponding edge strength. So concretely, let me fill this in. This Neural Networks: Learning: Backpropagation intuition.  And just another example, let’s look at this value. How do we get that value Neural Networks: Learning: Backpropagation intuition? It’s a similar process if this weight, which I’m going to highlight in green, if this weight is equal to, say, Neural Networks: Learning: Backpropagation intuition. Then we have that Neural Networks: Learning: Backpropagation intuition. And by the way so far I’ve been writing the Neural Networks: Learning: Backpropagation intuition values only for hidden units but excluding the bias units depending on how you define the back propagation algorithm and depend on how you implement it. You may end up implementing something to compute Neural Networks: Learning: Backpropagation intuition values for these bias units as well. The bias units always output the value “+1” and they are just what they are and there’s no way for us to change the value, and so depending on your implementation in back prop. The way I usually implement it, I do end up computing these Neural Networks: Learning: Backpropagation intuition values but we just discard them, we don’t use them because they don’t end up being part of the calculation needed to compute a derivative. So hopefully that gives you a little better intuition about what back propagation is doing.

In case all of these still seem sort of magical and sort of black box. In the later video, in the “putting it together” video, I’ll try to give a little bit more intuition about what back propagation is doing. But unfortunately this is, difficult algorithm to try to visualize and understand what it is really doing. But fortunately, I guess many people have been using it very successfully for many years and if you implement the algorithm you can have a very effective learning algorithm even though the inner working of exactly how it works can be hard to visualize.

<end>