梯度下降公式理解（为什么使用cost function的导数？）

在gradient descent 梯度下降公式中，一般的表达都是如下：
梯度下降公式理解（为什么使用cost function的导数？）
之前没有认真思考这个公式为什么这样定义？只理解到学习率如何影响到最小值的获得。

但是学习率 α 后为什么用 θ1处的求导呢？在吴恩达的课程论坛中也看到类似的提问：
论坛链接：为什么用这个公式

有个回答很清楚，我直接贴过来了。可以看出，其实可以不必使用cost function的导数。
梯度下降公式理解（为什么使用cost function的导数？）

但是用cost fucntion求导肯定有其好处，这篇文章阐述的很清楚，如链接：Gradient Descent Derivation
Why does gradient descent use the derivative of the cost function? Finding the slope of the cost function at our current Ѳ value tells us two things.
为什么梯度下降公式中使用 cost function的导数？在当前的 θ点上，计算出cost function的导数有两个好处：

The first is the direction to move theta in. When you look at the plot of a function, a positive slope means the function goes upward as you move right, so we want to move left in order to find the minimum. Similarly, a negative slope means the function goes downard towards the right, so we want to move right to find the minimum.
第一：可以决定移动的方向，在cost function函数曲线中，当斜率为正时，如果向右移动，那么函数值向上增加，所以此时需要向左移动以找到最小值。

The second is how big of a step to take. If the slope is large we want to take a large step because we’re far from the minimum. If the slope is small we want to take a smaller step. Note in the example above how gradient descent takes increasingly smaller steps towards the minimum with each iteration.
第二：可以决定移动的幅度。

梯度下降公式理解（为什么使用cost function的导数？）

相关推荐