deeplearning.ai - 优化算法 (Optimization Algorithms)

改善深层神经网络：超参数调试、正则化以及优化
吴恩达 Andrew Ng

Mini-batch 梯度下降法

把巨大的数据集分成一个一个的小部分
5000000 examples, 1000 × 5000, $X^{{1}} . . . X^{{5000}}$ , $Y^{{1}} . . . Y^{{5000}}$
epoch means a single pass through the training set
Batch gradient descent’s cost decrease on every iteration
Mini-batch gradient descent may not decrease on every iteration. It trends downwards, but it’s going to be a little bit noisier.
mini-batch size = m: Batch gradient descent
mini-batch size = 1: Stochastic gradient descent (随机梯度下降法)
不会收敛，最终在最小值处波动
一般 mini-batch 大小为 64、128、256、512

root mean square prop

Adaptive Moment Estimation

结合 Momentum 和 RMSprop
适用性广泛
Hyperparameters:

$α :$ needs to be tuned, $β_{1} : 0.9$ , $β_{2} : 0.999$ , $ϵ : 10^{- 8}$

随着迭代的增加，渐渐减小学习率